IITDBGroup / gprom

GProM is a middleware that adds support for provenance to database backends.
http://www.cs.iit.edu/%7edbgroup/research/gprom.php
Apache License 2.0
8 stars 5 forks source link

Optimize non-ALL versions of set operations in Postgres serializer #66

Open lordpretzel opened 3 years ago

lordpretzel commented 3 years ago

GProm's algebra uses bag versions of set operations. An input set version, is translated into the bag version plus duplicate elimination. When translating back in the serializer we just use the bag version. We should whether a set operation is used in conjunction with duplicate elimination and if that is the case use the set version instead.

SELECT * FROM r UNION SELECT * FROM s;

is translated into

DuplicateRemoval
  Union
    Projection [a b ]
      TableAccess [r]
    Projection [c d ]
      TableAccess [s]

and then serialized into

SELECT  DISTINCT F0."a" AS "a", F0."b" AS "b"
FROM ((
SELECT F0."a" AS "a", F0."b" AS "b"
FROM "r" AS F0 UNION ALL
SELECT F0."c" AS "c", F0."d" AS "d"
FROM "s" AS F0)) F0

when instead it would be better to generate

SELECT   F0."a" AS "a", F0."b" AS "b"
FROM ((
SELECT F0."a" AS "a", F0."b" AS "b"
FROM "r" AS F0 UNION
SELECT F0."c" AS "c", F0."d" AS "d"
FROM "s" AS F0)) F0