AbsaOSS / ABRiS

Avro SerDe for Apache Spark structured APIs.
Apache License 2.0
231 stars 76 forks source link

Fix tests for Spark 3.5.0 #350

Closed kevinwallimann closed 10 months ago

kevinwallimann commented 10 months ago

Describe the bug

Executing tests fails with Spark 3.5.0

To Reproduce

Steps to reproduce the behavior OR commands run:

  1. Check out latest master
  2. Change Spark version in pom.xml to 3.5.0
  3. Run mvn clean test (using Java 8)
  4. See errors below
[ERROR] /ABRiS/src/main/scala/za/co/absa/abris/examples/ConfluentKafkaAvroWriter.scala:88: error: value apply is not a member of object org.apache.spark.sql.catalyst.encoders.RowEncoder
[ERROR]     RowEncoder.apply(sparkSchema)
[ERROR]                ^
[ERROR] one error found

This error can be fixed by replacing RowEncoder.apply with org.apache.spark.sql.Encoders.row

The next error is

SchemaLoaderSpec:
SchemaLoader
*** RUN ABORTED ***
  java.lang.NoClassDefFoundError: com/fasterxml/jackson/core/exc/StreamConstraintsException
  at com.fasterxml.jackson.databind.node.JsonNodeFactory.objectNode(JsonNodeFactory.java:353)
  at com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:100)
  at com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:25)
  at com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:323)
  at com.fasterxml.jackson.databind.ObjectMapper._readValue(ObjectMapper.java:4801)
  at com.fasterxml.jackson.databind.ObjectMapper.readTree(ObjectMapper.java:3084)
  at org.apache.avro.Schema$Parser.parse(Schema.java:1430)
  at org.apache.avro.Schema$Parser.parse(Schema.java:1418)
  at all_types.test.NativeSimpleOuter.<clinit>(NativeSimpleOuter.java:18)
  at za.co.absa.abris.examples.data.generation.TestSchemas$.<init>(TestSchemas.scala:35)

This can be fixed e.g. by explicitly setting the jackson-core dependency to version 2.15.2, thereby overriding v2.12.2 that is included by avro 1.10.2. Spark 3.5.0 depends on jackson-databind v2.15.2

        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-core</artifactId>
            <version>2.15.2</version>
        </dependency>

Expected behavior

The tests should run for the current Spark versions 3.2.4, 3.3.3, 3.4.2 and 3.5.0. These versions should be added in the test-and-verify Github action.

Additional context

If you replace RowEncoder.apply with RowEncoder.encoderFor, the following exception is thrown in 18 tests.

- should convert all types of data to confluent avro an back using schema registry for key *** FAILED ***
  org.apache.spark.SparkRuntimeException: Only expression encoders are supported for now.
  at org.apache.spark.sql.errors.QueryExecutionErrors$.unsupportedEncoderError(QueryExecutionErrors.scala:477)
  at org.apache.spark.sql.catalyst.encoders.package$.encoderFor(package.scala:34)
  at org.apache.spark.sql.catalyst.plans.logical.CatalystSerde$.generateObjAttr(object.scala:47)
  at org.apache.spark.sql.execution.ExternalRDD$.apply(ExistingRDD.scala:35)
  at org.apache.spark.sql.SparkSession.createDataset(SparkSession.scala:498)
  at org.apache.spark.sql.SQLContext.createDataset(SQLContext.scala:367)
  at org.apache.spark.sql.SQLImplicits.rddToDatasetHolder(SQLImplicits.scala:236)
  at za.co.absa.abris.avro.sql.CatalystAvroConversionSpec.getTestingDataFrame(CatalystAvroConversionSpec.scala:55)
  at za.co.absa.abris.avro.sql.CatalystAvroConversionSpec.$anonfun$new$23(CatalystAvroConversionSpec.scala:484)
  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
  ...

The fix is to replace RowEncoder.encoderFor by org.apache.spark.sql.Encoders.row, as mentioned in https://issues.apache.org/jira/browse/SPARK-45311

If you get java.lang.IllegalAccessError: class org.apache.spark.storage.StorageUtils$ (in unnamed module @0x74ad2091) cannot access class sun.nio.ch.DirectBuffer (in module java.base) because module java.base does not export sun.nio.ch to unnamed module @0x74ad2091, run it with Java 8, or add the VM option --add-exports java.base/sun.nio.ch=ALL-UNNAMED