Open alamb opened 6 months ago
I'd like a try to help it. :)
THis one may be tricky, FWIW. The join code is not simple.
Aha, seems true. Maybe I can leave it here now and find some not so difficult. And I think I could fix it when I get more familiar with the code.❤️
Is your feature request related to a problem or challenge?
Similarly to https://github.com/apache/arrow-datafusion/issues/7848, @metesynnada noted https://github.com/apache/arrow-datafusion/pull/8020#issuecomment-1903359773 that it is possible for
NestedLoopsJoin
to generate a single (very) largeRecordBatch
. For certain pathalogical queries this may lead to DataFusion far exceeding its memory limits and erroring outDescribe the solution you'd like
Implement / adapt the same approach as @korowa did in https://github.com/apache/arrow-datafusion/pull/8020 (❤️ ) to incrementally create join output for joins that match many keys rather than doing it all at once.
Describe alternatives you've considered
No response
Additional context
No response