AbsaOSS / ABRiS

Avro SerDe for Apache Spark structured APIs.
Apache License 2.0
230 stars 75 forks source link

Issue with from_avro in Abris 6.1.1 #308

Closed jessiedanwang closed 2 years ago

jessiedanwang commented 2 years ago

We have used from_avro in Abris 5.1.1 with Spark 3.1.2 (on EMR 6.5) without any issue. However, the follow error started showing up when trying to upgrade to Abris 6.1.1 or 6.0.0 with Spark 3.2.0 (on EMR 6.6)

stream.select(from_avro(col("value"), abrisConfig))

Exception in thread "pool-28-thread-3" java.lang.AbstractMethodError: Method za/co/absa/abris/avro/sql/AvroDataToCatalyst.child()Lorg/apache/spark/sql/catalyst/trees/TreeNode; is abstract at za.co.absa.abris.avro.sql.AvroDataToCatalyst.child(AvroDataToCatalyst.scala) at org.apache.spark.sql.catalyst.trees.UnaryLike.children(TreeNode.scala:1153) at org.apache.spark.sql.catalyst.trees.UnaryLike.children$(TreeNode.scala:1153) at org.apache.spark.sql.catalyst.expressions.UnaryExpression.children$lzycompute(Expression.scala:473) at org.apache.spark.sql.catalyst.expressions.UnaryExpression.children(Expression.scala:473) at org.apache.spark.sql.catalyst.trees.TreeNode.getDefaultTreePatternBits(TreeNode.scala:141) at org.apache.spark.sql.catalyst.trees.TreeNode.treePatternBits$lzycompute(TreeNode.scala:152) at org.apache.spark.sql.catalyst.trees.TreeNode.treePatternBits(TreeNode.scala:152) at org.apache.spark.sql.catalyst.trees.TreeNode.getDefaultTreePatternBits(TreeNode.scala:143) at org.apache.spark.sql.catalyst.trees.TreeNode.treePatternBits$lzycompute(TreeNode.scala:152) at org.apache.spark.sql.catalyst.trees.TreeNode.treePatternBits(TreeNode.scala:152) at org.apache.spark.sql.catalyst.plans.QueryPlan.treePatternBits$lzycompute(QueryPlan.scala:62) at org.apache.spark.sql.catalyst.plans.QueryPlan.treePatternBits(QueryPlan.scala:57) at org.apache.spark.sql.catalyst.trees.TreePatternBits.containsPattern(TreePatternBits.scala:32) at org.apache.spark.sql.catalyst.trees.TreePatternBits.containsPattern$(TreePatternBits.scala:31) at org.apache.spark.sql.catalyst.trees.TreeNode.containsPattern(TreeNode.scala:118) at org.apache.spark.sql.catalyst.optimizer.OptimizeUpdateFields$.$anonfun$apply$1(UpdateFields.scala:76) at org.apache.spark.sql.catalyst.optimizer.OptimizeUpdateFields$.$anonfun$apply$1$adapted(UpdateFields.scala:76) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning(AnalysisHelper.scala:167) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning$(AnalysisHelper.scala:164) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDownWithPruning(LogicalPlan.scala:30) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsWithPruning(AnalysisHelper.scala:99) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsWithPruning$(AnalysisHelper.scala:96) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsWithPruning(LogicalPlan.scala:30) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveExpressionsWithPruning(AnalysisHelper.scala:244) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveExpressionsWithPruning$(AnalysisHelper.scala:242) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveExpressionsWithPruning(LogicalPlan.scala:30) at org.apache.spark.sql.catalyst.optimizer.OptimizeUpdateFields$.apply(UpdateFields.scala:76) at org.apache.spark.sql.catalyst.optimizer.OptimizeUpdateFields$.apply(UpdateFields.scala:33) at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:215) at scala.collection.IndexedSeqOptimized.foldLeft(IndexedSeqOptimized.scala:60) at scala.collection.IndexedSeqOptimized.foldLeft$(IndexedSeqOptimized.scala:68) at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:38) at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeBatch$1(RuleExecutor.scala:212) at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$6(RuleExecutor.scala:284) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.sql.catalyst.rules.RuleExecutor$RuleExecutionContext$.withContext(RuleExecutor.scala:327) at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$5(RuleExecutor.scala:284) at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$5$adapted(RuleExecutor.scala:274) at scala.collection.immutable.List.foreach(List.scala:431) at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:274) at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:188) at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:215) at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:209) at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:172) at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179) at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107) at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:179) at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:193) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330) at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:192) at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:90) at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:192) at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:224) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:224) at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:90) at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:88) at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:80) at org.apache.spark.sql.Dataset$.$anonfun$ofRows$1(Dataset.scala:92) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:90) at org.apache.spark.sql.Dataset.withPlan(Dataset.scala:3808) at org.apache.spark.sql.Dataset.select(Dataset.scala:1495)

cerveada commented 2 years ago

Abris 6.1.1 is compiled against Spark 3.2 so it should not be a problem, are you sure you used the mentioned versions?

jessiedanwang commented 2 years ago

just realized that i have used some spark libs compiled with spark 3.0.1, after updating them spark 3.2.0, the issue disappeared, thanks.