apache / datafusion-comet

Apache DataFusion Comet Spark Accelerator
https://datafusion.apache.org/comet
Apache License 2.0
447 stars 100 forks source link

SortMergeJoin with unsupported key type should fall back to Spark #354

Closed viirya closed 2 weeks ago

viirya commented 2 weeks ago

Describe the bug

DataFusion SortMergeJoin doesn't support all data types as join keys. See https://github.com/apache/datafusion/blob/dd5683745e7d527b01b804c8f4f1a0a53aa225e8/datafusion/physical-plan/src/joins/sort_merge_join.rs#L1521-L1523

Comet shouldn't transform Spark SortMergeJoin to DataFusion SortMergeJoin for unsupported join key types. Otherwise, we will see the following error currently:

  org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 6.0 failed 1 times, most recent failure: Lost task 0.0 in stage 6.0 (TID 4) (192.168.86.44 executor driver): org.apache.comet.CometNativeException
: This feature is not implemented: Unsupported data type in sort merge join comparator
        at org.apache.comet.Native.executePlan(Native Method)
        at org.apache.comet.CometExecIterator.executeNative(CometExecIterator.scala:71)
        at org.apache.comet.CometExecIterator.getNextBatch(CometExecIterator.scala:123)
        at org.apache.comet.CometExecIterator.hasNext(CometExecIterator.scala:138)

This is found during fixing test failures in #250.

Steps to reproduce

No response

Expected behavior

No response

Additional context

No response