apache / incubator-gluten

Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
https://gluten.apache.org/
Apache License 2.0
1.22k stars 437 forks source link

[GLUTEN-7971][CH] Support using left side as the build table for the left anti/semi join #7981

Closed zzcclp closed 1 day ago

zzcclp commented 3 days ago

What changes were proposed in this pull request?

Now Vanilla Spark does not support the right anti/semi join, but CH backend does. According to the runtime statistics, it can convert the A left anti/semi join B to B right anti/semi join A when AQE is on and the side ot A table is the smaller than B table.

Close #7971.

(Fixes: #7971)

How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

github-actions[bot] commented 3 days ago

https://github.com/apache/incubator-gluten/issues/7971

github-actions[bot] commented 3 days ago

Run Gluten Clickhouse CI on x86

github-actions[bot] commented 3 days ago

Run Gluten Clickhouse CI on x86

zzcclp commented 2 days ago

Have you run the ch backend TPC-DS benchmark? Although velox supports the right semi, we found a regression with TPC-DS q14a/q14b. facebookincubator/velox#9980

Yeah, I run the TPCH SF100 banchmark, there is also a regression with the TPCH Q21, 33s to 41s with right semi.

github-actions[bot] commented 2 days ago

Run Gluten Clickhouse CI on x86

lgbo-ustc commented 2 days ago

LGTM

zzcclp commented 2 days ago

Yeah, I run the TPCH SF100 banchmark, there is also a regression with the TPCH Q21, 33s to 41s with right semi.

Given current performance regression, will CH backend continue this work? Thanks.

We plan to add a parameter to control whether to enable this feature.

github-actions[bot] commented 2 days ago

Run Gluten Clickhouse CI on x86

zzcclp commented 1 day ago

Run Gluten Clickhouse CI on x86