apache / datafusion-comet

Apache DataFusion Comet Spark Accelerator
https://datafusion.apache.org/comet
Apache License 2.0
614 stars 113 forks source link

compatibility issue with AWS EMR 6.15.0 SPARK 3.4.1 #411

Closed ceppelli closed 1 month ago

ceppelli commented 1 month ago

Describe the bug

Compiling and running datafusion-comet for AWS EMR version emr-6.15.0 with Spark 3.4.1 won't work

how to reproduce the issue

scala> (0 until 10).toDF("a").write.mode("overwrite").parquet("/tmp/test")
scala>  spark.read.parquet("/tmp/test").createOrReplaceTempView("t1")
scala>  spark.sql("select * from t1 where a > 5").show
scala.MatchError: 8 (of class java.lang.Integer)
  at org.apache.comet.shims.ShimCometScanExec.$anonfun$newFileScanRDD$1(ShimCometScanExec.scala:73)
  at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
  at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
  at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)

the cause of the problem

the file spark-sql_2.12-3.4.1-amzn-2.jar is a custom version of spark and contains the class org.apache.spark.sql.execution.datasources.FileScanRDD with 2 constructs, one with 6 parameters and the second with 8 parameters.

Steps to reproduce

scala> (0 until 10).toDF("a").write.mode("overwrite").parquet("/tmp/test")
scala>  spark.read.parquet("/tmp/test").createOrReplaceTempView("t1")
scala>  spark.sql("select * from t1 where a > 5").show
scala.MatchError: 8 (of class java.lang.Integer)
  at org.apache.comet.shims.ShimCometScanExec.$anonfun$newFileScanRDD$1(ShimCometScanExec.scala:73)
  at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
  at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
  at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)

Expected behavior

No response

Additional context

No response