AbsaOSS / ABRiS

Avro SerDe for Apache Spark structured APIs.
Apache License 2.0
227 stars 73 forks source link

Fix NoSuchMethodException in Spark 3.5.x #352

Closed kevinwallimann closed 7 months ago

kevinwallimann commented 7 months ago

Describe the bug

As reported in #343, Abris fails with java.lang.NoSuchMethodException: org.apache.spark.sql.avro.AvroDeserializer.<init>(org.apache.avro.Schema, org.apache.spark.sql.types.DataType, java.lang.String) when trying to instantiate Spark's AvroDeserializer from AbrisAvroDeserializer.

This was reported for Spark on Databricks Runtime 14.2

The issue is caused by a new constructor argument which was added in https://github.com/apache/spark/pull/44964. The specific argument is not present in the latest release of the open-source Spark, version 3.5.0 (https://github.com/apache/spark/blob/v3.5.0/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala#L53-L56), but has apparently been released in Databricks Runtime.

To Reproduce

To reproduce the bug locally, without Databricks Runtime, Spark needs to be built on branch-3.5. For example:

cd /tmp
git clone https://github.com/apache/spark.git
cd spark
git checkout branch-3.5
./build/mvn -DskipTests clean install

In Abris' pom.xml, replace <spark.version>3.5.0</spark.version> with <spark.version>3.5.1-SNAPSHOT</spark.version>

Run

mvn clean test -Pspark-3.5

See error in tests, for example

- should replace undeserializable record with default SpecificRecord *** FAILED ***
  org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 55.0 failed 1 times, most recent failure: Lost task 1.0 in stage 55.0 (TID 111) (q2ncdftp47.home executor driver): java.lang.NoSuchMethodException: org.apache.spark.sql.avro.AvroDeserializer.<init>(org.apache.avro.Schema, org.apache.spark.sql.types.DataType, java.lang.String)

Expected behavior

All tests should pass.

Additional context

Another new constructor argument, stableIdPrefixForUnionType: String, was added in https://github.com/apache/spark/pull/44964, currently marked with fix version 4.0.0