IITDBGroup / gprom

GProM is a middleware that adds support for provenance to database backends.
http://www.cs.iit.edu/%7edbgroup/research/gprom.php
Apache License 2.0
8 stars 5 forks source link

Semantic optimization for DL lineage removes comparison predicates #90

Closed lordpretzel closed 1 year ago

lordpretzel commented 1 year ago

When computing lineage for a datalog query and activating semantic optimization, if the semantic optimization ends up rewriting the query, then comparison predicates are dopped.

Q(X) :- R(X,Y), S(Y,Z), Y < 5, Z < 10. ANS: Q. LINEAGE FOR R.

gets rewritten into

q(x) :- r(x,y),s(y,z),((y < 5)),((z < 10)).
prov_r(x,y) :- r(x,y),r(x,V1),s(V1,V2),((V1 < 5)),((V2 < 10)).

but should be

q(x) :- r(x,y),s(y,z),((y < 5)),((z < 10)).
prov_r(x,y) :- r(x,y),((y < 5)),r(x,V1),s(V1,V2),((V1 < 5)),((V2 < 10)).
lordpretzel commented 1 year ago

For comparison we have to distinguish between multiple 5 cases:

1) comparisons that only refer variables from atoms we are keeping and these variable are implied by the head of the query => such comparison are not required for correctness since there is a single possible value for these variables for a given result tuple, but they may improve performance, e.g., by using an index in some cases so we will keep them [KEEP]

2) comparisons that only refer variables from atoms we are keeping which are not implied by the head variables of the query => these comparisons may filter tuples from the provenance so we have to keep them [KEEP].

3) comparisons using only variables from atoms we have removed => these can be savely removed because if we remove atoms then we know their variables are implied by the head variables, so they do not filter. [REMOVE]

4) comparisons using some variables from atoms that are removed and some that we are keeping with variables we are keeping are implied by the query's head variables. => using the same argument as in 1) and given that we only remove atoms whose variables are implied by the head variables, it is safe to remove these comparisons. [REMOVE]

5) comparisons using some variables from atoms that are removed and some that we are keeping with variables we are keeping are not implied by the query's head variables. => This case represents a problem, we have to apply the comparison predicate, but have already removed some atoms that bind the variables. Ideally, we should check this when removing atoms, but for now we just return the unoptimized rule as a fallback. [DO NOT OPTIMIZE]

This is now implemented