apache / datafusion-comet

Apache DataFusion Comet Spark Accelerator
https://datafusion.apache.org/comet
Apache License 2.0
611 stars 113 forks source link

Support BroadcastNestedLoopJoinExec #198

Open singhpk234 opened 3 months ago

singhpk234 commented 3 months ago

What is the problem the feature request solves?

Datafusion supports Cross and NestedLoop joins as well : https://docs.rs/datafusion-physical-plan/36.0.0/datafusion_physical_plan/joins/index.html

It will really nice if we can add support for it like Hash and SortMergeJoin.

Describe the potential solution

No response

Additional context

SHJ pr

singhpk234 commented 3 months ago

@viirya can i pick this up ?

viirya commented 3 months ago

Sure. I've not begun working on this yet. Which one you will work, NestedLoop or Cross Join? In Spark, they are two difference join operators, I think we should have different tickets for them instead of one.

singhpk234 commented 3 months ago

I can start with BNLJ :) !

singhpk234 commented 2 months ago

still working on it, will publish a pr by early next week, apologies for the delay.

viirya commented 2 months ago

No problem. Thank you for working on this.

viirya commented 2 months ago

Note that I found several bugs in current broadcast implementation when trying to enable broadcast by default in #213. Since BroadcastNestedLoopJoinExec uses broadcast, I suggest that you can work on the operator after #213 is merged.

singhpk234 commented 2 months ago

I suggest that you can work on the operator after https://github.com/apache/arrow-datafusion-comet/pull/213 is merged

Ack, let me rebase with this branch meanwhile