cozodb / cozo

A transactional, relational-graph-vector database that uses Datalog for query. The hippocampus for AI!
https://cozodb.org
Mozilla Public License 2.0
3.24k stars 92 forks source link

Wrong fields being used on index #254

Closed mateusvmv closed 3 months ago

mateusvmv commented 3 months ago

Cozo is using the wrong fields with indexes in a specific scenario.

To reproduce, use the following schema: :create source { x => b } :create target { a => c, b } ::index create target:b { b, a }

And check the query plan

::explain {
    ?[a,b,c] :=
        *source{x,b},
        *target{a,b,c}
}
stratum rule_idx rule atom_idx op ref joins_on filters/expr out_relation
0 0 ? 6 load_stored :source null [] ["x","b"]
0 0 ? 5 load_stored :target:b null [] ["1","2"]
0 0 ? 4 stored_prefix_join null {"b":"**1"} null ["b","**2"]
0 0 ? 3 load_stored :target null [] ["a","c","**0"]
0 0 ? 2 stored_mat_join null {"2":"0"} null ["b","a","c"]
0 0 ? 1 reorder null null null ["a","b","c"]
0 0 ? 0 out null null null ["a","b","c"]

target:b has the format [b,a], so 2 is column a target has format [a, c, b], so 0 is column b The plan here is trying to join on two different columns

If we change the query by adding the index explicitly, it uses a correct plan, but includes b on the join. There is no need to include b, because a determines b. If b isn't right after a, it results in a matrix join, making the operation more expensive.

::explain {
    ?[a,b,c] :=
        *source{x,b},
        *target:b{a,b},
        *target{a,b,c}
}
stratum rule_idx rule atom_idx op ref joins_on filters/expr out_relation
0 0 ? 6 load_stored :source null [] ["x","b"]
0 0 ? 5 load_stored :target:b null [] ["**0","a"]
0 0 ? 4 stored_prefix_join null {"b":"**0"} null ["b","a"]
0 0 ? 3 load_stored :target null [] ["1","c","2"]
0 0 ? 2 stored_mat_join null {"a":"1","b":"2"} null ["b","a","c"]
0 0 ? 1 reorder null null null ["a","b","c"]
0 0 ? 0 out null null null ["a","b","c"]

The optimal plan is obtained with the following query

::explain {
    ?[a,b,c] :=
        *source{x,b},
        *target:b{a,b},
        *target{a,c}
}
stratum rule_idx rule atom_idx op ref joins_on filters/expr out_relation
0 0 ? 6 load_stored :source null [] ["x","b"]
0 0 ? 5 load_stored :target:b null [] ["**0","a"]
0 0 ? 4 stored_prefix_join null {"b":"**0"} null ["b","a"]
0 0 ? 3 load_stored :target null [] ["**1","c","~1"]
0 0 ? 2 stored_prefix_join null {"a":"**1"} null ["b","a","c"]
0 0 ? 1 reorder null null null ["a","b","c"]
0 0 ? 0 out null null null ["a","b","c"]
mateusvmv commented 3 months ago

This doubles up with another issue:

:create rel {x=>a,b}
::index create rel:a {a,x}
::explain { ?[a1,a2,a3] := *rel{a:a1,b:a2},*rel{a:a2,b:a3},*rel{a:a3,b:a1} }

This is supposed to bind two variables in the last query, but it only binds one. In that case, I think it should choose a prefix join on the index and a mat join on the relation, using three atoms, if that is acceptable.

I'm working on a fix for both and will come up with a PR soon.