Open Gozala opened 1 month ago
Just realized that datomic does not have aggregators in :where
clause and instead has them in:find
. This is interesting because it eliminates the problem in queries like
const x = DB.variable()
const xs = DB.varibale()
export query = {
select: { x, xs },
where: [
DB.match([x, 'type', 'list/item']),
DB.aggregate.count(x, xs)
]
}
Because they become more like this
const x = DB.variable()
const xs = DB.varibale()
export const query = {
select: { x, xs: DB.aggregate.count(x) },
where: [
DB.match([x, 'type', 'list/item'])
]
}
Howover in datomic when aggregation is used other bindings are used for grouping
I encountered interesting flow in my design here is an example dataset
const groceries = DB.Link.of({ name: 'Groceries' })
const milk = DB.Link.of({ title: 'Buy Milk' })
const eggs = DB.Link.of({ title: 'Buy Eggs' })
const bread = DB.Link.of({ title: 'Buy Bread' })
const chores = DB.Link.of({ name: 'Chores' })
const laundry = DB.Link.of({ title: 'Do Laundry' })
const dishes = DB.Link.of({ title: 'Do Dishes' })
const db = DB.Memory.create([
[groceries, 'name', 'Groceries'],
[milk, 'title', 'Buy Milk'],
[eggs, 'title', 'Buy Eggs'],
[bread, 'title', 'Buy Bread'],
[groceries, 'todo', milk],
[groceries, 'todo', eggs],
[groceries, 'todo', bread],
[chores, 'name', 'Chores'],
[laundry, 'title', 'Do Laundry'],
[dishes, 'title', 'Do Dishes'],
[chores, 'todo', laundry],
[chores, 'todo', dishes],
])
If we run query like this on it
const list = DB.link()
const item = DB.link()
const title = DB.string()
const name = DB.string()
const todo = DB.variable()
DB.query(db, {
select: {
list,
name,
todo,
},
where: [
DB.match([list, 'name', name]),
DB.match([list, 'todo', item]),
DB.match([item, 'title', title]),
includes(todo, item),
],
})
We are not going to get
[
{ list: groceries, name: 'Groceries', todo: [milk, egg, bread] },
{ list: chores, name: 'Chores', todo: [laundry, dishes]
]
Instead we get
{
{ list: chores, name: 'Chores', todo: [milk, egg, bread, laundry, dishes] },
}
Although in datascript with this query
[:find ?list ?name (distinct ?child)
:where
[?list "name" ?name]
[?list "todo" ?item]
[?item "title", ?title]
[(tuple ?item ?title) ?child]]
We get following as proper grouping takes place
[
[1 "Groceries" [[2 "Buy Milk"] [3 "Buy Eggs"] [4 "Buy Bread"]]]
[5 "Chores" [[6 "Do Laundry"] [7 "Do Dishes"]]]
]
I think datomic attempts to unify frames except for the member that is aggregated, this also explains why :find ?list ?name (distinct ?child) ?title
produces distinct results as frames do not unify
[
[1 "Groceries" [[2 "Buy Milk"]] "Buy Milk"]
[1 "Groceries" [[3 "Buy Eggs"]] "Buy Eggs"]
[1 "Groceries" [[4 "Buy Bread"]] "Buy Bread"]
[5 "Chores" [[6 "Do Laundry"]] "Do Laundry"]
[5 "Chores" [[7 "Do Dishes"]] "Do Dishes"]
]
We can copy datomic here and push aggregation into select
or we can make our aggregation where clause do frame unification, although I'm not sure if that is how it would work in regular datalog.
I end up implementing simple solution for now that looks like this
DB.query(db, {
select: {
id: list,
name,
todo: [{id: item, title}]
},
where: [
DB.match([list, 'name', name]),
DB.match([list, 'todo', item]),
DB.match([item, 'title', title]),
],
})
ℹ️ An array wrapper [{id: item, title}]
in the selector implies that matches must be collected
@seefeldb brought up an interesting question about modeling empty collections. Which in would typically manifest as either []
or undefined
. Here however as illustrated by #35 above query would produce no results because second where
clause would eliminate prior matches.
This is really annoying and tricky because:
where
clause) and there is no obvious way to make it return []
because selection happens after matching, meaning frames will be eliminated before selection runs.
item
would need to be moved into optional group.where
clause as I was pursuing originally we would had to be explicit about optionality.Perhaps way to go about it would be to develop support for datomic pull like query interface. That would potentially give us more flexibility on how to interpret things e.g. same example in pull inspired notation could look something like
[
"id": {},
"name": {},
"todo": {
"title": {
"default": "untitled"
},
"done": {
"default": false,
}
}
]
Reverse lookups are kind of tricky in such syntax and other more complicated cases, but it would make empty aggregates and alike far more simple to model
Aggregators are not only convenient but necessary to do certain things e.g.
count
matching relations or collect them into a collection. SICP does not have this (unless I've missed it) which is why it was not implemented here.Thinking more about it I think I have an implementation idea that fits with SICP design.
evaluateAggregate
path inside the evaluator.It will extend a
Clause
with{Aggregate: Aggregate}
whereAggregate
is likeinput
from every frame and computeoutput
.Unknowns
It is not clear to me what to do with variables that got aggregated. If they are in the selector that would imply that we need to produce frame per each variable, but if they are not in the selector it would make sense to produce only one frame instead.
Should look into datomic / datascript to see what they do here