databricks / spark-avro

Avro Data Source for Apache Spark
http://databricks.com/
Apache License 2.0
539 stars 310 forks source link

In 1.8 generated fixed types do not extend GenericData.Fixed anymore #256

Closed steven-aerts closed 6 years ago

steven-aerts commented 6 years ago

Fixed types generated with the 1.8 avro code generator do not extend from GenericData.Fixed anymore. This triggers a ClassCastException when converting an object from such a class to a GenericRow.

java.lang.ClassCastException: com.technicolor.avro.common.MacAddress cannot be cast to org.apache.avro.generic.GenericData$Fixed
        at com.databricks.spark.avro.SchemaConverters$$anonfun$com$databricks$spark$avro$SchemaConverters$$createConverter$1$3.apply(SchemaConverters.scala:160)
        at com.databricks.spark.avro.SchemaConverters$$anonfun$com$databricks$spark$avro$SchemaConverters$$createConverter$1$5.apply(SchemaConverters.scala:207)
        at com.databricks.spark.avro.SchemaConverters$$anonfun$com$databricks$spark$avro$SchemaConverters$$createConverter$1$6$$anonfun$apply$1.apply(SchemaConverters.scala:227)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.Iterator$class.foreach(Iterator.scala:893)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
        at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
        at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
        at scala.collection.AbstractTraversable.map(Traversable.scala:104)
        at com.databricks.spark.avro.SchemaConverters$$anonfun$com$databricks$spark$avro$SchemaConverters$$createConverter$1$6.apply(SchemaConverters.scala:222)
        at com.databricks.spark.avro.SchemaConverters$$anonfun$com$databricks$spark$avro$SchemaConverters$$createConverter$1$5.apply(SchemaConverters.scala:207)
        at com.technicolor.doctor.hadoop.explore.SparkEventsEnvelope$AvroToRowConverter.call(SparkEventsEnvelope.java:249)

This problem is solved by using the GenericFixed base type.

codecov-io commented 6 years ago

Codecov Report

Merging #256 into branch-4.0 will not change coverage. The diff coverage is 100%.

@@             Coverage Diff             @@
##           branch-4.0     #256   +/-   ##
===========================================
  Coverage       90.71%   90.71%           
===========================================
  Files               5        5           
  Lines             334      334           
  Branches           50       50           
===========================================
  Hits              303      303           
  Misses             31       31
gengliangwang commented 6 years ago

Hi @steven-aerts , is this patch compatible with AVRO 1.7?

steven-aerts commented 6 years ago

Yes it is, as GenericFixed is much older than 1.8.

Btw spark-avro is still using avro 1.7.6 and everything still works after this patch.

gengliangwang commented 6 years ago

I take a quick try with py avro 1.8.2. And spark-avro can load the avro with fixed field. Can you share more details about how the read fails?

steven-aerts commented 6 years ago

When you take the following schema:

{
  "type": "record",
  "name": "Record",
  "namespace": "org.example.bug",
  "fields": [{"name": "macAddress", "type": {"type": "fixed",  "size": 6, "name":"MacAddress" }}]
}

And you generate the java code for it:

java -jar /usr/localdisk/software/avro-tools/avro-tools-1.8.0.jar schema compile record.avsc src/main/java

Then you can write a test like this:

@Test
public void testBug256() {
    Record record = Record.newBuilder().setMacAddress(new MacAddress(new byte[7])).build();
    Schema schema = Record.getClassSchema();
    StructType strucType = (StructType) SchemaConverters$.MODULE$.toSqlType(schema).dataType();
    GenericRow test = (GenericRow) SchemaConverters$.MODULE$.createConverterToSQL(schema, strucType).apply(record);
}

Which will fail with the stacktrace above. When you use the 1.7 code generator, it will work.

gengliangwang commented 6 years ago

Thanks, merge to master/branch-4.0