Closed andygrove closed 1 month ago
It sounds reasonable. The vectorized implementation of SMJ looks inefficient in DataFusion. I'm not sure if there is any optimized algorithm for SMJ in vectorized execution. If not, using SHJ to replace SMJ will be good for performance.
What is the problem the feature request solves?
Other Spark accelerators, such as Spark RAPIDS and Apache Gluten, replace SortMergeJoin with ShuffleHashJoin for improved performance. We should evaluate this approach for Comet.
Spark RAPIDS
Apache Gluten
Describe the potential solution
No response
Additional context
No response