IITDBGroup / gprom

GProM is a middleware that adds support for provenance to database backends.
http://www.cs.iit.edu/%7edbgroup/research/gprom.php
Apache License 2.0
8 stars 5 forks source link

rewrites for lineage for programs with aggregation and multiple levels #91

Open lordpretzel opened 1 year ago

lordpretzel commented 1 year ago

This shows up for this query:

Q1(l_okey) :-
lineitem(l_okey,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,l_comm_date,l_receipt_date,x11,x12,x13),
l_comm_date < l_receipt_date.

Q(o_opriority,count(1)) :-
orders(l_okey,x1,x2,x3,o_odate,o_opriority,x4,x5,x6),
Q1(l_okey),
o_odate >= '1993-07-01',
o_odate < '1993-10-01'.

ANS:Q.

RP(x1, x2) :- rtpcq04(x1, x2).

LINEAGE FOR lineitem FOR RESULTS FROM RP.

The regular program is:

PROGRAM:
        q1(l_okey) :- lineitem(l_okey,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,l_comm_date,l_receipt_date,x11,x12,x13),((l_comm_date < l_receipt_date)).
        q(o_opriority,count(1)) :- orders(l_okey,x1,x2,x3,o_odate,o_opriority,x4,x5,x6),@q1(l_okey),((o_odate >= '1993-07-01')),((o_odate < '1993-10-01')).
        rp(x1,x2) :- rtpcq04(x1,x2).
ANSWER RELATION:
        q
FDS:
        lineitem: l_orderkey, l_linenumber->l_partkey, l_suppkey, l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment
        orders: o_orderkey->o_custkey, o_orderstatus, o_totalprice, o_orderdate, o_orderpriority, o_clerk, o_shippriority, o_comment
COMPUTING LINEAGE OF: q
ONLY FOR RESULTS STORED IN PREDICATE: rp
SHOW LINEAGE FOR EDB RELATION: lineitem

which is rewritten into

PROGRAM:
        q1(l_okey) :- lineitem(l_okey,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,l_comm_date,l_receipt_date,x11,x12,x13),((l_comm_date < l_receipt_date)).
        q(o_opriority,count(1)) :- orders(l_okey,x1,x2,x3,o_odate,o_opriority,x4,x5,x6),q1(l_okey),((o_odate >= '1993-07-01')),((o_odate < '1993-10-01')).
        rp(x1,x2) :- rtpcq04(x1,x2).
        prov_q1(l_okey) :- orders(l_okey,x1,x2,x3,o_odate,o_opriority,x4,x5,x6),q1(l_okey),((o_odate >= '1993-07-01')),((o_odate < '1993-10-01')),rp(o_opriority,V0).
        prov_lineitem(l_okey,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,l_comm_date,l_receipt_date,x11,x12,x13) :- lineitem(l_okey,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,l_comm_date,l_receipt_date,x11,x12,x13),((l_comm_date < l_receipt_date)),q1(l_okey).
ANSWER RELATION:
        prov_lineitem

prov_q1 is determining what q1 values are part of the provenance for a particular query result matching rp, but it is not used in prov_lineitem instead of q1 to then filter out q1 results that are not contributing to a final result from rp.