With the new configurable schema converter feature (#268, #269), the class DefaultSchemaConverter is instantiated by default as member variable schemaConverter in AvroDataToCatalyst. Even though AvroDataToCatalyst as a case class is serializable by default, serialization fails when using the JavaSerializer with the following error message
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task not serializable: java.io.NotSerializableException: za.co.absa.abris.avro.sql.DefaultSchemaConverter
Serialization stack:
- object not serializable (class: za.co.absa.abris.avro.sql.DefaultSchemaConverter, value: za.co.absa.abris.avro.sql.DefaultSchemaConverter@1ce2ce83)
- field (class: za.co.absa.abris.avro.sql.AvroDataToCatalyst, name: schemaConverter, type: interface za.co.absa.abris.avro.sql.SchemaConverter)
- object (class za.co.absa.abris.avro.sql.AvroDataToCatalyst, from_avro(value#647, (readerSchema,{"type":"record","name":"e2etest","fields":[{"name":"field1","type":"string"},{"name":"field2","type":"int"}]})))
- field (class: org.apache.spark.sql.catalyst.expressions.IsNotNull, name: child, type: class org.apache.spark.sql.catalyst.expressions.Expression)
- object (class org.apache.spark.sql.catalyst.expressions.IsNotNull, isnotnull(from_avro(value#647, (readerSchema,{"type":"record","name":"e2etest","fields":[{"name":"field1","type":"string"},{"name":"field2","type":"int"}]}))))
- field (class: org.apache.spark.sql.execution.FilterExec, name: condition, type: class org.apache.spark.sql.catalyst.expressions.Expression)
- object (class org.apache.spark.sql.execution.FilterExec, Filter isnotnull(from_avro(value#647, (readerSchema,{"type":"record","name":"e2etest","fields":[{"name":"field1","type":"string"},{"name":"field2","type":"int"}]})))
How to fix
~Add Serializable trait to SchemaConverter to trait.~
Make schemaConverter lazy
Description
With the new configurable schema converter feature (#268, #269), the class
DefaultSchemaConverter
is instantiated by default as member variableschemaConverter
inAvroDataToCatalyst
. Even thoughAvroDataToCatalyst
as a case class is serializable by default, serialization fails when using theJavaSerializer
with the following error messageHow to fix
~Add
Serializable
trait toSchemaConverter
to trait.~ MakeschemaConverter
lazy