apache / datafusion-comet

Apache DataFusion Comet Spark Accelerator
https://datafusion.apache.org/comet
Apache License 2.0
823 stars 163 forks source link

Comet named_struct fails on duplicate field names #1015

Closed viirya closed 1 month ago

viirya commented 1 month ago

Describe the bug

Spark named_struct expression doesn't forbid duplicate field names. Comet named_struct implementation follows Spark. However, Java Arrow when importing arrays, it binds arrays by field name. So if you have a struct array with duplicate field names, an error like this will be thrown:

  org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 5959.0 failed 1 times, most recent failure: Lost task 0.0 in stage 5959.0 (TID 15003) (192.168.86.44 executor driver): java.lang.IllegalStateExcep
tion: ArrowArray struct has 2 children (expected 1)                                                                

Steps to reproduce

No response

Expected behavior

No response

Additional context

No response