StarRocks / starrocks

StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries.
https://starrocks.io
Apache License 2.0
8.75k stars 1.76k forks source link

[Optimization]Introduce a partial_apply_null method for Expr to short-circuit Join operators. #24570

Closed satanson closed 9 months ago

satanson commented 1 year ago

Enhancement

For queries as follows:

--- non-correlated subquery, planned to Cross JOIN
select a from t where t.b > (select max(b) from s)

--- inner join
-- select t.a, t.b from t join s on t.c = s.c and t.d < s.d

If the build side outputs all null column which used by predicates of the cross join and inner join and the predicates applying ot null partially can yields a false or null result, the join operator can short-circuit the probe side. In examples as above. " select max(b) from s" yields null, "t.b > null" yields null although the value of t.b is unknown.

The result of Expr's partial_apply_null method can categorized into 3 cases:

  1. false or null: a > null, a is unknown. it can be used by join to short-circuit probe side.
  2. true: (a > null) is true, this always-true expr can be eliminated.
  3. nondeterministic: a > null or b > c, can not be leveraged by this optimization.

Some correctness and performance test should be conducted on this optimization.

github-actions[bot] commented 10 months ago

We have marked this issue as stale because it has been inactive for 6 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to StarRocks!