cozodb / cozo

A transactional, relational-graph-vector database that uses Datalog for query. The hippocampus for AI!
https://cozodb.org
Mozilla Public License 2.0
3.24k stars 92 forks source link

Fix indexes when join is needed #255

Closed mateusvmv closed 3 months ago

mateusvmv commented 3 months ago

Fixes #254 Fixed the first query, with the correct binding of "a" to "2", where it would previously bind "1" (which is bound to b) to "**2".

 stratum | rule_idx | rule | atom_idx | op                   | ref         | joins_on    | filters/expr | out_relation 
---------+----------+------+----------+----------------------+-------------+-------------+--------------+-------------------
 0       | 0        | "?"  | 6        | "load_stored"        | ":source"   | null        | []           | ["x", "b"] 
 0       | 0        | "?"  | 5        | "load_stored"        | ":target:b" | null        | []           | ["**1", "**2"] 
 0       | 0        | "?"  | 4        | "stored_prefix_join" | null        | {"b":"**1"} | null         | ["b", "**2"] 
 0       | 0        | "?"  | 3        | "load_stored"        | ":target"   | null        | []           | ["a", "c", "**0"] 
 0       | 0        | "?"  | 2        | "stored_prefix_join" | null        | {"**2":"a"} | null         | ["b", "a", "c"] 
 0       | 0        | "?"  | 1        | "reorder"            | null        | null        | null         | ["a", "b", "c"] 
 0       | 0        | "?"  | 0        | "out"                | null        | null        | null         | ["a", "b", "c"] 

Fixed the second query,with the introduction of the filter, where it would previously ignore the bindings not present in the index.

 stratum | rule_idx | rule | atom_idx | op                   | ref      | joins_on     | filters/expr    | out_relation 
---------+----------+------+----------+----------------------+----------+--------------+-----------------+---------------------------
 0       | 0        | "?"  | 10       | "load_stored"        | ":rel"   | null         | []              | ["~1", "a1", "a2"] 
 0       | 0        | "?"  | 9        | "load_stored"        | ":rel:a" | null         | []              | ["**1", "**2"] 
 0       | 0        | "?"  | 8        | "stored_prefix_join" | null     | {"a2":"**1"} | null            | ["a1", "a2", "**2"] 
 0       | 0        | "?"  | 7        | "load_stored"        | ":rel"   | null         | []              | ["~2", "**0", "a3"] 
 0       | 0        | "?"  | 6        | "stored_prefix_join" | null     | {"**2":"~2"} | null            | ["a1", "a2", "a3"] 
 0       | 0        | "?"  | 5        | "load_stored"        | ":rel:a" | null         | []              | ["**5", "**6"] 
 0       | 0        | "?"  | 4        | "stored_prefix_join" | null     | {"a3":"**5"} | null            | ["a1", "a2", "a3", "**6"] 
 0       | 0        | "?"  | 3        | "load_stored"        | ":rel"   | null         | []              | ["~3", "**3", "**4"] 
 0       | 0        | "?"  | 2        | "stored_prefix_join" | null     | {"**6":"~3"} | null            | ["a1", "a2", "a3", "**4"] 
 0       | 0        | "?"  | 1        | "filter"             | null     | null         | ["eq(a1, **4)"] | ["a1", "a2", "a3"] 
 0       | 0        | "?"  | 0        | "out"                | null     | null         | null            | ["a1", "a2", "a3"] 

This makes the use of index, when not all fields are available in the index relation, compile down to:

  1. A load on the index
  2. A join of the index to the original relation
  3. A filter on the bindings that weren't present in the index