AbsaOSS / ABRiS

Avro SerDe for Apache Spark structured APIs.
Apache License 2.0
229 stars 75 forks source link

Spark 2.3 no such method error #114

Closed kristenmcintosh closed 4 years ago

kristenmcintosh commented 4 years ago

Hi,

When trying to deserialize confluent Avro messages I get a no such method error when running on spark 2.3, but the same code works fine on spark 2.4. I have updated the org.apache.avro dependency on spark 2.3 to be 1.9 as stated in the documentation. This issue only occurs when deserializing confluent avro, as my code for serializing runs fine on 2.3. Not sure if this is an issue on ABRIS or a more general spark issue.

Although I know the code works on 2.4, my team is not yet ready to upgrade, so I still need this to run on 2.3.

Please see the stack trace below. Thanks.

Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.boxedType(Lorg/apache/spark/sql/types/DataType;)Ljava/lang/String;
    at za.co.absa.abris.avro.sql.AvroDataToCatalyst$$anonfun$doGenCode$1.apply(AvroDataToCatalyst.scala:81)
    at za.co.absa.abris.avro.sql.AvroDataToCatalyst$$anonfun$doGenCode$1.apply(AvroDataToCatalyst.scala:80)
    at org.apache.spark.sql.catalyst.expressions.UnaryExpression$$anonfun$defineCodeGen$1.apply(Expression.scala:391)
    at org.apache.spark.sql.catalyst.expressions.UnaryExpression$$anonfun$defineCodeGen$1.apply(Expression.scala:390)
    at org.apache.spark.sql.catalyst.expressions.UnaryExpression.nullSafeCodeGen(Expression.scala:407)
    at org.apache.spark.sql.catalyst.expressions.UnaryExpression.defineCodeGen(Expression.scala:390)
    at za.co.absa.abris.avro.sql.AvroDataToCatalyst.doGenCode(AvroDataToCatalyst.scala:80)
    at org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:107)
    at org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:104)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:104)
    at org.apache.spark.sql.catalyst.expressions.UnaryExpression.nullSafeCodeGen(Expression.scala:406)
    at org.apache.spark.sql.catalyst.expressions.GetStructField.doGenCode(complexTypeExtractors.scala:126)
    at org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:107)
    at org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:104)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:104)
    at org.apache.spark.sql.catalyst.expressions.Alias.genCode(namedExpressions.scala:142)
    at org.apache.spark.sql.execution.ProjectExec$$anonfun$6.apply(basicPhysicalOperators.scala:60)
    at org.apache.spark.sql.execution.ProjectExec$$anonfun$6.apply(basicPhysicalOperators.scala:60)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
    at scala.collection.AbstractTraversable.map(Traversable.scala:104)
    at org.apache.spark.sql.execution.ProjectExec.doConsume(basicPhysicalOperators.scala:60)
    at org.apache.spark.sql.execution.CodegenSupport$class.constructDoConsumeFunction(WholeStageCodegenExec.scala:208)
    at org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:179)
    at org.apache.spark.sql.execution.InputAdapter.consume(WholeStageCodegenExec.scala:354)
    at org.apache.spark.sql.execution.InputAdapter.doProduce(WholeStageCodegenExec.scala:383)
    at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:88)
    at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
    at org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:83)
    at org.apache.spark.sql.execution.InputAdapter.produce(WholeStageCodegenExec.scala:354)
    at org.apache.spark.sql.execution.ProjectExec.doProduce(basicPhysicalOperators.scala:45)
    at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:88)
    at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
    at org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:83)
    at org.apache.spark.sql.execution.ProjectExec.produce(basicPhysicalOperators.scala:35)
    at org.apache.spark.sql.execution.WholeStageCodegenExec.doCodeGen(WholeStageCodegenExec.scala:524)
    at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:576)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
    at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
    at org.apache.spark.sql.execution.columnar.InMemoryRelation.buildBuffers(InMemoryRelation.scala:107)
    at org.apache.spark.sql.execution.columnar.InMemoryRelation.<init>(InMemoryRelation.scala:102)
    at org.apache.spark.sql.execution.columnar.InMemoryRelation$.apply(InMemoryRelation.scala:43)
    at org.apache.spark.sql.execution.CacheManager$$anonfun$cacheQuery$1.apply(CacheManager.scala:97)
    at org.apache.spark.sql.execution.CacheManager.writeLock(CacheManager.scala:67)
    at org.apache.spark.sql.execution.CacheManager.cacheQuery(CacheManager.scala:91)
    at org.apache.spark.sql.Dataset.persist(Dataset.scala:2924)
    at my.com.KafkaComsumer$.main(KafkaConsumer.scala:98)
    at my.com.KafkaComsume.main(KafkaConsumer.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:906)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 
cerveada commented 4 years ago

Hi, I checked and that method is in both Spark 2.3.x and 2.4.x. My guess is there is some other version mismatch. Are you using the same Scala version of Abris and Spark?

cerveada commented 4 years ago

Which version of Abris are you using? It's probably duplicate of this: https://github.com/AbsaOSS/ABRiS/issues/54

kristenmcintosh commented 4 years ago

I’m using Abris 3.1.2 with Scala 2.11 and spark 2.3

cerveada commented 4 years ago

You were right it's a bug. In Spark 2.3 the method is org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext.boxedType In Spark 2.4 it was moved to org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.boxedType

They are both in the same long file, so it's easy to miss it.

kristenmcintosh commented 4 years ago

Oh yes, that’s easy to miss forsure. I didn’t notice they were differently named either when I was looking into this. Thanks.

Any timeline on when this can be fixed?

cerveada commented 4 years ago

The plan is to release new Abris version next week. It will include several other improvements.

kristenmcintosh commented 4 years ago

Okay, great!

cerveada commented 4 years ago

Fixed in 3.2.0