Closed jurplel closed 5 months ago
I was looking at Umbra's plan For Q7. Would it be helpful to push down hints for the (n1.n_name = 'FRANCE' AND n2.n_name = 'GERMANY') OR (n1.n_name = 'GERMANY' AND n2.n_name = 'FRANCE')
predicate? You still need to keep the join, but you could push down FRANCE
or GERMANY
to both sides and reduce the number of things to look at.
TPC-H Q7:
SELECT
supp_nation,
cust_nation,
l_year,
SUM(volume) AS revenue
FROM
(
SELECT
n1.n_name AS supp_nation,
n2.n_name AS cust_nation,
EXTRACT(YEAR FROM l_shipdate) AS l_year,
l_extendedprice * (1 - l_discount) AS volume
FROM
supplier,
lineitem,
orders,
customer,
nation n1,
nation n2
WHERE
s_suppkey = l_suppkey
AND o_orderkey = l_orderkey
AND c_custkey = o_custkey
AND s_nationkey = n1.n_nationkey
AND c_nationkey = n2.n_nationkey
AND (
(n1.n_name = 'FRANCE' AND n2.n_name = 'GERMANY')
OR (n1.n_name = 'GERMANY' AND n2.n_name = 'FRANCE')
)
AND l_shipdate BETWEEN DATE '1995-01-01' AND DATE '1996-12-31'
) AS shipping
GROUP BY
supp_nation,
cust_nation,
l_year
ORDER BY
supp_nation,
cust_nation,
l_year;
I was looking at Umbra's plan For Q7. Would it be helpful to push down hints for the
(n1.n_name = 'FRANCE' AND n2.n_name = 'GERMANY') OR (n1.n_name = 'GERMANY' AND n2.n_name = 'FRANCE')
predicate? You still need to keep the join, but you could push downFRANCE
orGERMANY
to both sides and reduce the number of things to look at. ...
This is a great example, but if you look at what Umbra is doing, this is actually a separate step from pushdown: Unoptimized:
Expression Simplification:
If the filter were simplified to a conjunction first, then the filter pushdown implementation would be able to operate on the clauses independently. Dealing with an in
expression with two columns is a separate issue, but I'm not sure optd would simplify the expression like that.
I was looking at Umbra's plan For Q7. Would it be helpful to push down hints for the
(n1.n_name = 'FRANCE' AND n2.n_name = 'GERMANY') OR (n1.n_name = 'GERMANY' AND n2.n_name = 'FRANCE')
predicate? You still need to keep the join, but you could push downFRANCE
orGERMANY
to both sides and reduce the number of things to look at. ...This is a great example, but if you look at what Umbra is doing, this is actually a separate step from pushdown: Unoptimized: Expression Simplification:
If the filter were simplified to a conjunction first, then the filter pushdown implementation would be able to operate on the clauses independently. Dealing with an
in
expression with two columns is a separate issue, but I'm not sure optd would simplify the expression like that.
Yep, I agree that this seems to be relevant to expression optimizations.
@Sweetsuro all comments finally addressed—approve and i will merge it!
This PR brings a filter pushdown heuristic rule, built on @AveryQi115's hybrid scheme.
Filter Pushdown Rule
Helper Functions
LogOpExpr::new_flattened_nested_logical
creates a newLogOpExpr
from anExprList
, and it flattens any nestedLogOpExpr
s of the sameLogOpType
.Expr::rewrite_column_refs
recursively rewrites anyColumnExpr
in an expression tree, using a providedrewrite_fn
.LogicalJoin::map_through_join
takes in left/right schema sizes, and maps an index to be as it would if it were pushed down to the left or right side of a join.LogicalProjection::compute_column_mapping
creates aColumnMapping
object from aLogicalProjection
.ColumnMapping
object has a few methods, but most importantly it hasrewrite_condition
, which given an expr, will rewrite the expression with the projection's mapping.Testing Utilities
new_test_optimizer
creates a new heuristic optimizer, which applies a given rule. It uses aTpchCatalog
.TpchCatalog
is a catalog implementing a couple of tables from the TPC-H schema. It can be extended to have more as needed.DummyCostModel
implements a cost model, only giving zero cost. It is used for constructing a cascades optimizer without a real cost model, and isn't used in this PR.