cmu-db / optd

CMU-DB's Cascades optimizer framework
https://cmu-db.github.io/optd/
MIT License
349 stars 19 forks source link

feat: Filter Pushdown Rule #140

Closed jurplel closed 5 months ago

jurplel commented 6 months ago

This PR brings a filter pushdown heuristic rule, built on @AveryQi115's hybrid scheme.

Filter Pushdown Rule

Helper Functions

Testing Utilities

yliang412 commented 6 months ago

I was looking at Umbra's plan For Q7. Would it be helpful to push down hints for the (n1.n_name = 'FRANCE' AND n2.n_name = 'GERMANY') OR (n1.n_name = 'GERMANY' AND n2.n_name = 'FRANCE') predicate? You still need to keep the join, but you could push down FRANCE or GERMANY to both sides and reduce the number of things to look at.

TPC-H Q7:

SELECT
    supp_nation,
    cust_nation,
    l_year,
    SUM(volume) AS revenue
FROM
    (
        SELECT
            n1.n_name AS supp_nation,
            n2.n_name AS cust_nation,
            EXTRACT(YEAR FROM l_shipdate) AS l_year,
            l_extendedprice * (1 - l_discount) AS volume
        FROM
            supplier,
            lineitem,
            orders,
            customer,
            nation n1,
            nation n2
        WHERE
            s_suppkey = l_suppkey
            AND o_orderkey = l_orderkey
            AND c_custkey = o_custkey
            AND s_nationkey = n1.n_nationkey
            AND c_nationkey = n2.n_nationkey
            AND (
                (n1.n_name = 'FRANCE' AND n2.n_name = 'GERMANY')
                OR (n1.n_name = 'GERMANY' AND n2.n_name = 'FRANCE')
            )
            AND l_shipdate BETWEEN DATE '1995-01-01' AND DATE '1996-12-31'
    ) AS shipping
GROUP BY
    supp_nation,
    cust_nation,
    l_year
ORDER BY
    supp_nation,
    cust_nation,
    l_year;
jurplel commented 6 months ago

I was looking at Umbra's plan For Q7. Would it be helpful to push down hints for the (n1.n_name = 'FRANCE' AND n2.n_name = 'GERMANY') OR (n1.n_name = 'GERMANY' AND n2.n_name = 'FRANCE') predicate? You still need to keep the join, but you could push down FRANCE or GERMANY to both sides and reduce the number of things to look at. ...

This is a great example, but if you look at what Umbra is doing, this is actually a separate step from pushdown: Unoptimized:

image

Expression Simplification:

image

If the filter were simplified to a conjunction first, then the filter pushdown implementation would be able to operate on the clauses independently. Dealing with an in expression with two columns is a separate issue, but I'm not sure optd would simplify the expression like that.

yliang412 commented 6 months ago

I was looking at Umbra's plan For Q7. Would it be helpful to push down hints for the (n1.n_name = 'FRANCE' AND n2.n_name = 'GERMANY') OR (n1.n_name = 'GERMANY' AND n2.n_name = 'FRANCE') predicate? You still need to keep the join, but you could push down FRANCE or GERMANY to both sides and reduce the number of things to look at. ...

This is a great example, but if you look at what Umbra is doing, this is actually a separate step from pushdown: Unoptimized: image Expression Simplification: image

If the filter were simplified to a conjunction first, then the filter pushdown implementation would be able to operate on the clauses independently. Dealing with an in expression with two columns is a separate issue, but I'm not sure optd would simplify the expression like that.

Yep, I agree that this seems to be relevant to expression optimizations.

jurplel commented 5 months ago

@Sweetsuro all comments finally addressed—approve and i will merge it!