AbsaOSS / spline-spark-agent

Spline agent for Apache Spark
https://absaoss.github.io/spline/
Apache License 2.0
183 stars 93 forks source link

When there is a broadcast, an error will be reported #619

Closed zyw8136 closed 1 year ago

zyw8136 commented 1 year ago

When there are broadcasts, the error will be reported in python

: java.lang.NoSuchMethodError: org.apache.spark.sql.execution.datasources.FilePartition.files()[Lorg/apache/spark/sql/execution/datasources/PartitionedFile;
    at za.co.absa.spline.harvester.plugin.embedded.RDDPlugin$$anonfun$rddReadNodeProcessor$1$$anonfun$1.apply(RDDPlugin.scala:42)
    at za.co.absa.spline.harvester.plugin.embedded.RDDPlugin$$anonfun$rddReadNodeProcessor$1$$anonfun$1.apply(RDDPlugin.scala:42)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
    at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
    at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
    at za.co.absa.spline.harvester.plugin.embedded.RDDPlugin$$anonfun$rddReadNodeProcessor$1.applyOrElse(RDDPlugin.scala:42)
    at za.co.absa.spline.harvester.plugin.embedded.RDDPlugin$$anonfun$rddReadNodeProcessor$1.applyOrElse(RDDPlugin.scala:40)
    at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
    at za.co.absa.spline.harvester.plugin.embedded.JDBCPlugin$$anonfun$rddReadNodeProcessor$1.applyOrElse(JDBCPlugin.scala:47)
    at za.co.absa.spline.harvester.plugin.embedded.JDBCPlugin$$anonfun$rddReadNodeProcessor$1.applyOrElse(JDBCPlugin.scala:47)
    at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
    at za.co.absa.spline.harvester.builder.read.PluggableReadCommandExtractor$$anonfun$2.applyOrElse(PluggableReadCommandExtractor.scala:52)
    at za.co.absa.spline.harvester.builder.read.PluggableReadCommandExtractor$$anonfun$2.applyOrElse(PluggableReadCommandExtractor.scala:50)
    at scala.PartialFunction$Lifted.apply(PartialFunction.scala:223)
    at scala.PartialFunction$Lifted.apply(PartialFunction.scala:219)
    at scala.PartialFunction$.condOpt(PartialFunction.scala:286)
    at za.co.absa.spline.harvester.builder.read.PluggableReadCommandExtractor.asReadCommand(PluggableReadCommandExtractor.scala:50)
    at za.co.absa.spline.harvester.LineageHarvester.za$co$absa$spline$harvester$LineageHarvester$$createOperationBuilder(LineageHarvester.scala:195)
    at za.co.absa.spline.harvester.LineageHarvester$$anonfun$14.apply(LineageHarvester.scala:171)
    at za.co.absa.spline.harvester.LineageHarvester$$anonfun$14.apply(LineageHarvester.scala:171)
    at scala.Option.getOrElse(Option.scala:121)
    at za.co.absa.spline.harvester.LineageHarvester.traverseAndCollect$1(LineageHarvester.scala:171)
    at za.co.absa.spline.harvester.LineageHarvester.za$co$absa$spline$harvester$LineageHarvester$$createOperationBuildersRecursively(LineageHarvester.scala:190)
    at za.co.absa.spline.harvester.LineageHarvester$$anonfun$harvest$2.apply(LineageHarvester.scala:67)
    at za.co.absa.spline.harvester.LineageHarvester$$anonfun$harvest$2.apply(LineageHarvester.scala:65)
    at scala.Option.flatMap(Option.scala:171)
    at za.co.absa.spline.harvester.LineageHarvester.harvest(LineageHarvester.scala:65)
    at za.co.absa.spline.agent.SplineAgent$$anon$2$$anonfun$handle$1.apply$mcV$sp(SplineAgent.scala:91)
    at za.co.absa.spline.agent.SplineAgent$$anon$2.withErrorHandling(SplineAgent.scala:100)
    at za.co.absa.spline.agent.SplineAgent$$anon$2.handle(SplineAgent.scala:72)
    at za.co.absa.spline.harvester.listener.QueryExecutionListenerDelegate.onSuccess(QueryExecutionListenerDelegate.scala:28)
    at za.co.absa.spline.harvester.listener.SplineQueryExecutionListener$$anonfun$onSuccess$1.apply(SplineQueryExecutionListener.scala:41)
    at za.co.absa.spline.harvester.listener.SplineQueryExecutionListener$$anonfun$onSuccess$1.apply(SplineQueryExecutionListener.scala:41)
    at scala.Option.foreach(Option.scala:257)
    at za.co.absa.spline.harvester.listener.SplineQueryExecutionListener.onSuccess(SplineQueryExecutionListener.scala:41)
    at org.apache.spark.sql.util.ExecutionListenerManager$$anonfun$onSuccess$1$$anonfun$apply$mcV$sp$1.apply(QueryExecutionListener.scala:124)
    at org.apache.spark.sql.util.ExecutionListenerManager$$anonfun$onSuccess$1$$anonfun$apply$mcV$sp$1.apply(QueryExecutionListener.scala:123)
    at org.apache.spark.sql.util.ExecutionListenerManager$$anonfun$org$apache$spark$sql$util$ExecutionListenerManager$$withErrorHandling$1.apply(QueryExecutionListener.scala:145)
    at org.apache.spark.sql.util.ExecutionListenerManager$$anonfun$org$apache$spark$sql$util$ExecutionListenerManager$$withErrorHandling$1.apply(QueryExecutionListener.scala:143)
    at scala.collection.immutable.List.foreach(List.scala:392)
    at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
    at scala.collection.mutable.ListBuffer.foreach(ListBuffer.scala:45)
    at org.apache.spark.sql.util.ExecutionListenerManager.org$apache$spark$sql$util$ExecutionListenerManager$$withErrorHandling(QueryExecutionListener.scala:143)
    at org.apache.spark.sql.util.ExecutionListenerManager$$anonfun$onSuccess$1.apply$mcV$sp(QueryExecutionListener.scala:123)
    at org.apache.spark.sql.util.ExecutionListenerManager$$anonfun$onSuccess$1.apply(QueryExecutionListener.scala:123)
    at org.apache.spark.sql.util.ExecutionListenerManager$$anonfun$onSuccess$1.apply(QueryExecutionListener.scala:123)
    at org.apache.spark.sql.util.ExecutionListenerManager.readLock(QueryExecutionListener.scala:156)
    at org.apache.spark.sql.util.ExecutionListenerManager.onSuccess(QueryExecutionListener.scala:122)
    at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3367)
    at org.apache.spark.sql.Dataset.<init>(Dataset.scala:194)
    at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79)
    at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:748)
cerveada commented 1 year ago

What version of Spark and Spline do you use?

zyw8136 commented 1 year ago

spark 2.4; spline 1.0.4

cerveada commented 1 year ago

And Scala version?

Are you sure you have the correct spline agent bundle? This is often a cause of such issues.

zyw8136 commented 1 year ago

image image

cerveada commented 1 year ago

Please use the latest version of Spark 2.4 that is 2.4.8. That will fix the issue.