apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
6.37k stars 1.2k forks source link

Add spilling support for HashJoin #12952

Open comphead opened 1 month ago

comphead commented 1 month ago

Is your feature request related to a problem or challenge?

It would be helpful to have spilling support for the HashJoin. If there is not enough memory on the machine the join can leverage local disk to spill intermittent results

For reference the spilling support for SortMergeJoin https://github.com/apache/datafusion/issues/9359

Some ideas was also covered in https://github.com/apache/datafusion/issues/1599

More reading also https://facebookincubator.github.io/velox/develop/spilling.html

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

comphead commented 1 month ago

@andygrove @viirya cc

demetribu commented 4 weeks ago

take