AbsaOSS / ABRiS

Avro SerDe for Apache Spark structured APIs.
Apache License 2.0
231 stars 76 forks source link

Compatibility with Spark 3.5 #355

Closed KrAxmalL closed 9 months ago

KrAxmalL commented 9 months ago

Describe the bug

Using ABRiS 6.4.0 with Spark 3.5.0 fails with the next error: Caused by: java.lang.NoClassDefFoundError: org/apache/spark/sql/internal/SQLConf$LegacyBehaviorPolicy$ at org.apache.spark.sql.avro.AvroDeserializer.<init>(AvroDeserializer.scala:62) at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62) ... 153 more Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.internal.SQLConf$LegacyBehaviorPolicy$ ... 155 more

Possible reason

Seems like the issue is caused by using the AvroSerializer class from the spark-avro:3.2.4 dependency. It has the following import: import org.apache.spark.sql.internal.SQLConf.LegacyBehaviorPolicy while in spark 3.5.0 it was changed to: import org.apache.spark.sql.internal.LegacyBehaviorPolicy

kevinwallimann commented 9 months ago

Hi @KrAxmalL Thanks for your bug report. Indeed, I was able to reproduce it by running

spark-submit \
--class za.co.absa.abris.examples.ConfluentKafkaAvroWriter \
--master local \
--packages za.co.absa:abris_2.12:6.4.0,org.apache.spark:spark-sql-kafka-0-10_2.12:3.5.0 \
~/.m2/repository/za/co/absa/abris_2.12/6.4.0/abris_2.12-6.4.0.jar

If this is similar to how you use ABRiS, you can fix the problem by explicitly specifying the correct version of the spark-avro dependency in the --packages argument, e.g. like this:

spark-submit \
--class za.co.absa.abris.examples.ConfluentKafkaAvroWriter \
--master local \
--packages za.co.absa:abris_2.12:6.4.0,org.apache.spark:spark-sql-kafka-0-10_2.12:3.5.0,org.apache.spark:spark-avro_2.12:3.5.0 \
~/.m2/repository/za/co/absa/abris_2.12/6.4.0/abris_2.12-6.4.0.jar
KrAxmalL commented 9 months ago

@kevinwallimann thank you for the answer! But is it possible to make ABRiS use the spark-avro dependency of version 3.5 (or basically use version which matches the spark version) in build time when using SBT as a build tool?

KrAxmalL commented 9 months ago

@kevinwallimann Nevermind, i found the way to address the issue, thank you for the help. For anyone who will have the same issue: try to exclude the spark-avro or all spark dependencies from the ABRiS and add spark-avro dependency as compile-time. Example for SBT:

"za.co.absa"  %% "abris" % 6.4.0 excludeAll (ExclusionRule("org.apache.spark")),
"org.apache.spark"` %% "spark-avro" % 3.5.0