citusdata / citus

Distributed PostgreSQL as an extension
https://www.citusdata.com
GNU Affero General Public License v3.0
10.67k stars 671 forks source link

TPC-H query support #1862

Open marcocitus opened 6 years ago

marcocitus commented 6 years ago

In the TPC-H benchmark, when creating 'lineitem' and 'orders' as distributed tables and the remaining tables as reference tables, the following queries are unsupported:

These queries require improvements to the join order planner (e.g. use join restriction information) and recursive planning.

Q13: Join order planner should recognise that <reference table> LEFT JOIN <distributed table> requires re-partitioning both tables by the join column.

Alternatively, recursive planning could recursively plan a query of the form (SELECT <distributed_table>.* FROM <reference table> INNER JOIN <distributed table> ..) and then perform <reference table> LEFT JOIN <intermediate result>.

Q17: Join order planner should allow sublinks and recognise that lineitem (a distributed table) and part (a reference table) both need to be re-partitioned by partkey (semi-join clause) in order to handle the correlated subquery.

Q20: Recursive planning should recognise the subquery in the WHERE clause on part should be recursively planned because of the additional, correlated subquery in the WHERE clause on lineitem

Join order planner should allow sublinks and recognise that lineitem (a distributed table) and part (a reference table) and partsupp (a reference table) all need to be re-partitioned by partkey (semi-join clause) in order to handle the join in the correlated subquery.

Q22: Recursive planning should recognise the subquery in the WHERE clause on customer should be recursively planned because of the additional, correlated subquery in the WHERE clause on orders

Join order planner should allow sublinks and recognise that orders (a distributed table) and customer (a reference table) both need to be re-partitioned by custkey (semi-join clause) in order to handle the join in the correlated subquery.

marcocitus commented 3 years ago

Schema:

CREATE TABLE supplier (
        s_suppkey  INTEGER NOT NULL,
        s_name CHAR(25) NOT NULL,
        s_address VARCHAR(40) NOT NULL,
        s_nationkey INTEGER NOT NULL,
        s_phone CHAR(15) NOT NULL,
        s_acctbal NUMERIC NOT NULL,
        s_comment VARCHAR(101) NOT NULL
);

CREATE TABLE part (
        p_partkey INTEGER NOT NULL,
        p_name VARCHAR(55) NOT NULL,
        p_mfgr CHAR(25) NOT NULL,
        p_brand CHAR(10) NOT NULL,
        p_type VARCHAR(25) NOT NULL,
        p_size INTEGER NOT NULL,
        p_container CHAR(10) NOT NULL,
        p_retailprice NUMERIC NOT NULL,
        p_comment VARCHAR(23) NOT NULL
);

CREATE TABLE partsupp (
        ps_partkey INTEGER NOT NULL,
        ps_suppkey INTEGER NOT NULL,
        ps_availqty INTEGER NOT NULL,
        ps_supplycost NUMERIC NOT NULL,
        ps_comment VARCHAR(199) NOT NULL
);

CREATE TABLE customer (
        c_custkey INTEGER NOT NULL,
        c_name VARCHAR(25) NOT NULL,
        c_address VARCHAR(40) NOT NULL,
        c_nationkey INTEGER NOT NULL,
        c_phone CHAR(15) NOT NULL,
        c_acctbal NUMERIC NOT NULL,
        c_mktsegment CHAR(10) NOT NULL,
        c_comment VARCHAR(117) NOT NULL
);

CREATE TABLE orders (
        o_orderkey BIGINT NOT NULL,
        o_custkey INTEGER NOT NULL,
        o_orderstatus CHAR(1) NOT NULL,
        o_totalprice NUMERIC NOT NULL,
        o_orderdate DATE NOT NULL,
        o_orderpriority CHAR(15) NOT NULL,
        o_clerk CHAR(15) NOT NULL,
        o_shippriority INTEGER NOT NULL,
        o_comment VARCHAR(79) NOT NULL
);

CREATE TABLE lineitem (
        l_orderkey BIGINT NOT NULL,
        l_partkey INTEGER NOT NULL,
        l_suppkey INTEGER NOT NULL,
        l_linenumber INTEGER NOT NULL,
        l_quantity NUMERIC NOT NULL,
        l_extendedprice NUMERIC NOT NULL,
        l_discount NUMERIC NOT NULL,
        l_tax NUMERIC NOT NULL,
        l_returnflag CHAR(1) NOT NULL,
        l_linestatus CHAR(1) NOT NULL,
        l_shipdate DATE NOT NULL,
        l_commitdate DATE NOT NULL,
        l_receiptdate DATE NOT NULL,
        l_shipinstruct CHAR(25) NOT NULL,
        l_shipmode CHAR(10) NOT NULL,
        l_comment VARCHAR(44) NOT NULL
);

CREATE TABLE nation (
        n_nationkey INTEGER NOT NULL,
        n_name CHAR(25) NOT NULL,
        n_regionkey INTEGER NOT NULL,
        n_comment VARCHAR(152) NOT NULL
);

CREATE TABLE region (
        r_regionkey INTEGER NOT NULL,
        r_name CHAR(25) NOT NULL,
        r_comment VARCHAR(152) NOT NULL
);

SELECT create_distributed_table('lineitem', 'l_orderkey');
SELECT create_distributed_table('orders', 'o_orderkey');
SELECT create_reference_table('customer');
SELECT create_reference_table('part');
SELECT create_reference_table('partsupp');
SELECT create_reference_table('supplier');
SELECT create_reference_table('region');
SELECT create_reference_table('nation');