NVIDIA / spark-rapids-benchmarks

Spark RAPIDS Benchmarks – benchmark sets and utilities for the RAPIDS Accelerator for Apache Spark
Apache License 2.0
36 stars 27 forks source link

[BUG] Delta related jobs failed due to Spark version incompatibility #191

Closed wjxiz1992 closed 3 months ago

wjxiz1992 commented 3 months ago

Describe the bug This is almost identical to #189 . Delta also provides only-one-version support in one certain release jar. When using incompatible Delta jar against Spark, the following error is thrown:

java.lang.IncompatibleClassChangeError: class org.apache.spark.sql.catalyst.plans.logical.DeltaDelete has interface org.apache.spark.sql.catalyst.plans.logical.UnaryNode as super class
  at java.lang.ClassLoader.defineClass1(Native Method)
  at java.lang.ClassLoader.defineClass(ClassLoader.java:756)
  at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
  at java.net.URLClassLoader.defineClass(URLClassLoader.java:473)
  at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
  at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
  at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
  at java.security.AccessController.doPrivileged(Native Method)
  at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
  at org.apache.spark.sql.delta.DeltaAnalysis.apply(DeltaAnalysis.scala:64)
  at org.apache.spark.sql.delta.DeltaAnalysis.apply(DeltaAnalysis.scala:57)
  at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:211)

Steps/Code to reproduce bug

  1. use Spark > 3.2.0
  2. launch spark-shell with incompatble Delta jar
    
    spark-3.2.1-bin-hadoop3.2/bin/spark-shell --packages io.delta:delta-core_2.12:1.0.1 --conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension --conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog
3. read arbitary file
```Scala
spark.read.option("delimiter", "|").csv("../spark-rapids-benchmarks/origin_data_sf1/item")

Expected behavior No error.

Environment details (please complete the following information)

Solution Use Delta jar version = 1.1.0 for Spark 3.2.x Add More info in README to inform users.