AbsaOSS / ABRiS

Avro SerDe for Apache Spark structured APIs.
Apache License 2.0
230 stars 75 forks source link

Abris #239 part 2 - Upgrade to spark 3.2 #245

Closed gundersborg closed 2 years ago

gundersborg commented 2 years ago

Upgrade to Spark 3.2.0. Second part of a fix for #239.

This PR builds on top of the changes already requested in PR #244. Since the Spark upgrade fails without those changes, I have included them in this PR too, in order to be able to check out the changes and run tests. If you prefer to have just the Spark change in this PR, let me know and I'll remove the commits from #244.

Currently, the changes suggested only works for Spark 3.2. Due to changes in Spark, the AvroDataToCatalyst and CatalystDataToAvro classes needs to override a new withNewChildInternal method, which becomes an issue when simultaneously supporting Spark 3.2 and older versions: Spark 3.2 compilation will fail and require the Catalyst classes to be abstract without, and 3.1 and older will fail because the new method isn't overriding anything. There might be an elegant way to solve this with traits or similar, but with my limited Scala experience I haven't managed to. This would likely be a decider for whether Spark 3.2 support needs to be maintained in a separate branch or not.

AvroDeserializer contains some throws of a generic scala.Exception instead of IncompatibleSchemaException, since that one is not longer accessible. It's not great, but once #240 is merged the ABRiS AvroDeserializer will be replaced with the one from Spark, so that is either fine or these changes can be piggybacked with #240.

I have run tests for Spark 3.2 and Scala 2.12 and they are green. I haven't tested Scala 2.11, since Spark 3.2 does not support support it (and all the older Spark versions obviously fail due to withNewChildInternal not overriding anything).