apache / incubator-gluten

Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
https://gluten.apache.org/
Apache License 2.0
1.22k stars 437 forks source link

[CH] Optimization for multi joins on the same keys #8007

Open lgbo-ustc opened 1 day ago

lgbo-ustc commented 1 day ago

Backend

CH (ClickHouse)

Bug description

We have met a case as following

select * from t1 left join t2 on t1.key = t2.key 
 left join t3 on t1.key = t3.key 
 left join t4 on t1.key = t4.key
 ....

This will put multi joins in one execution stage, it's easy to cause OOM, since each join operator consume a lot of memory. It's easy to alloc memory for each join properly.

A related issue #8003

Spark version

None

Spark configurations

No response

System information

No response

Relevant logs

No response

lgbo-ustc commented 1 day ago

We may have other ways to implete this but not joins