AbsaOSS / spline-spark-agent

Spline agent for Apache Spark
https://absaoss.github.io/spline/
Apache License 2.0
183 stars 93 forks source link

Unable to use with AWS Glue 4.0 #630

Closed imjaleel closed 1 year ago

imjaleel commented 1 year ago

Running a pyspark job with some transformations and writing the output to Delta tables in S3. AWS Glue 4.0 uses Spark 3.3 and Scala 2.12 Added spark-3.3-spline-agent-bundle_2.12-1.0.6.jar to the Dependant JARS Using the Parameters - --packages za.co.absa.spline.agent.spark:spark-3.3-spline-agent-bundle_2.12:1.0.6

and added the below parameters to the Spark Session .config("spark.sql.queryExecutionListeners", "za.co.absa.spline.harvester.listener.SplineQueryExecutionListener") \ .config("spark.spline.lineageDispatcher", "console")

The Job starts, while the output is being written to S3, the job fails with the below error.

ERROR [spark-listener-group-shared] util.Utils (Logging.scala:logError(98)): uncaught error in thread spark-listener-group-shared, stopping SparkContext
java.lang.ExceptionInInitializerError: null
    at za.co.absa.spline.harvester.HashBasedUUIDGenerator.nextId(idGenerators.scala:57) ~[spark-3.3-spline-agent-bundle_2.12-1.0.6.jar:?]
    at za.co.absa.spline.harvester.DataTypeIdGenerator.nextId(idGenerators.scala:83) ~[spark-3.3-spline-agent-bundle_2.12-1.0.6.jar:?]
    at za.co.absa.spline.harvester.converter.DataTypeConverter.convert(DataTypeConverter.scala:39) ~[spark-3.3-spline-agent-bundle_2.12-1.0.6.jar:?]
    at za.co.absa.spline.agent.SplineAgent$$anon$1$$anon$2.za$co$absa$commons$lang$CachingConverter$$super$convert(SplineAgent.scala:76) ~[spark-3.3-spline-agent-bundle_2.12-1.0.6.jar:?]
    at za.co.absa.commons.lang.CachingConverter.$anonfun$convert$1(converters.scala:47) ~[spark-3.3-spline-agent-bundle_2.12-1.0.6.jar:?]
    at scala.collection.mutable.MapLike.getOrElseUpdate(MapLike.scala:206) ~[scala-library.jar:?]
    at scala.collection.mutable.MapLike.getOrElseUpdate$(MapLike.scala:203) ~[scala-library.jar:?]
    at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:80) ~[scala-library.jar:?]
    at za.co.absa.commons.lang.CachingConverter.convert(converters.scala:47) ~[spark-3.3-spline-agent-bundle_2.12-1.0.6.jar:?]
    at za.co.absa.commons.lang.CachingConverter.convert$(converters.scala:44) ~[spark-3.3-spline-agent-bundle_2.12-1.0.6.jar:?]
    at za.co.absa.spline.agent.SplineAgent$$anon$1$$anon$2.convert(SplineAgent.scala:76) ~[spark-3.3-spline-agent-bundle_2.12-1.0.6.jar:?]
    at za.co.absa.spline.agent.SplineAgent$$anon$1$$anon$2.convert(SplineAgent.scala:76) ~[spark-3.3-spline-agent-bundle_2.12-1.0.6.jar:?]
    at za.co.absa.spline.harvester.converter.DataTypeConverter.convert(DataTypeConverter.scala:43) ~[spark-3.3-spline-agent-bundle_2.12-1.0.6.jar:?]
    at za.co.absa.spline.harvester.converter.AttributeConverter.convert(AttributeConverter.scala:42) ~[spark-3.3-spline-agent-bundle_2.12-1.0.6.jar:?]
    at za.co.absa.spline.harvester.builder.plan.PlanOperationNodeBuilder$$anon$1.za$co$absa$commons$lang$CachingConverter$$super$convert(PlanOperationNodeBuilder.scala:37) ~[spark-3.3-spline-agent-bundle_2.12-1.0.6.jar:?]
    at za.co.absa.commons.lang.CachingConverter.$anonfun$convert$1(converters.scala:47) ~[spark-3.3-spline-agent-bundle_2.12-1.0.6.jar:?]
    at scala.collection.mutable.MapLike.getOrElseUpdate(MapLike.scala:206) ~[scala-library.jar:?]
    at scala.collection.mutable.MapLike.getOrElseUpdate$(MapLike.scala:203) ~[scala-library.jar:?]
    at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:80) ~[scala-library.jar:?]
    at za.co.absa.commons.lang.CachingConverter.convert(converters.scala:47) ~[spark-3.3-spline-agent-bundle_2.12-1.0.6.jar:?]
    at za.co.absa.commons.lang.CachingConverter.convert$(converters.scala:44) ~[spark-3.3-spline-agent-bundle_2.12-1.0.6.jar:?]
    at za.co.absa.spline.harvester.builder.plan.PlanOperationNodeBuilder$$anon$1.convert(PlanOperationNodeBuilder.scala:37) ~[spark-3.3-spline-agent-bundle_2.12-1.0.6.jar:?]
    at za.co.absa.spline.harvester.builder.plan.PlanOperationNodeBuilder.$anonfun$outputAttributes$1(PlanOperationNodeBuilder.scala:67) ~[spark-3.3-spline-agent-bundle_2.12-1.0.6.jar:?]
    at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:233) ~[scala-library.jar:?]
    at scala.collection.immutable.List.foreach(List.scala:388) ~[scala-library.jar:?]
    at scala.collection.TraversableLike.map(TraversableLike.scala:233) ~[scala-library.jar:?]
    at scala.collection.TraversableLike.map$(TraversableLike.scala:226) ~[scala-library.jar:?]
    at scala.collection.immutable.List.map(List.scala:294) ~[scala-library.jar:?]
    at za.co.absa.spline.harvester.builder.plan.PlanOperationNodeBuilder.outputAttributes(PlanOperationNodeBuilder.scala:67) ~[spark-3.3-spline-agent-bundle_2.12-1.0.6.jar:?]
    at za.co.absa.spline.harvester.builder.plan.PlanOperationNodeBuilder.outputAttributes$(PlanOperationNodeBuilder.scala:66) ~[spark-3.3-spline-agent-bundle_2.12-1.0.6.jar:?]
    at za.co.absa.spline.harvester.builder.plan.read.ReadNodeBuilder.outputAttributes$lzycompute(ReadNodeBuilder.scala:28) ~[spark-3.3-spline-agent-bundle_2.12-1.0.6.jar:?]
    at za.co.absa.spline.harvester.builder.plan.read.ReadNodeBuilder.outputAttributes(ReadNodeBuilder.scala:28) ~[spark-3.3-spline-agent-bundle_2.12-1.0.6.jar:?]
    at za.co.absa.spline.harvester.builder.plan.read.ReadNodeBuilder.build(ReadNodeBuilder.scala:42) ~[spark-3.3-spline-agent-bundle_2.12-1.0.6.jar:?]
    at za.co.absa.spline.harvester.builder.plan.read.ReadNodeBuilder.build(ReadNodeBuilder.scala:28) ~[spark-3.3-spline-agent-bundle_2.12-1.0.6.jar:?]
    at za.co.absa.spline.harvester.LineageHarvester.$anonfun$harvest$6(LineageHarvester.scala:68) ~[spark-3.3-spline-agent-bundle_2.12-1.0.6.jar:?]
    at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:233) ~[scala-library.jar:?]
    at scala.collection.immutable.List.foreach(List.scala:388) ~[scala-library.jar:?]
    at scala.collection.TraversableLike.map(TraversableLike.scala:233) ~[scala-library.jar:?]
    at scala.collection.TraversableLike.map$(TraversableLike.scala:226) ~[scala-library.jar:?]
    at scala.collection.immutable.List.map(List.scala:294) ~[scala-library.jar:?]
    at za.co.absa.spline.harvester.LineageHarvester.$anonfun$harvest$4(LineageHarvester.scala:68) ~[spark-3.3-spline-agent-bundle_2.12-1.0.6.jar:?]
    at scala.Option.flatMap(Option.scala:171) ~[scala-library.jar:?]
    at za.co.absa.spline.harvester.LineageHarvester.harvest(LineageHarvester.scala:61) ~[spark-3.3-spline-agent-bundle_2.12-1.0.6.jar:?]
    at za.co.absa.spline.agent.SplineAgent$$anon$1.$anonfun$handle$1(SplineAgent.scala:91) ~[spark-3.3-spline-agent-bundle_2.12-1.0.6.jar:?]
    at za.co.absa.spline.agent.SplineAgent$$anon$1.withErrorHandling(SplineAgent.scala:100) ~[spark-3.3-spline-agent-bundle_2.12-1.0.6.jar:?]
    at za.co.absa.spline.agent.SplineAgent$$anon$1.handle(SplineAgent.scala:72) ~[spark-3.3-spline-agent-bundle_2.12-1.0.6.jar:?]
    at za.co.absa.spline.harvester.listener.QueryExecutionListenerDelegate.onSuccess(QueryExecutionListenerDelegate.scala:28) ~[spark-3.3-spline-agent-bundle_2.12-1.0.6.jar:?]
    at za.co.absa.spline.harvester.listener.SplineQueryExecutionListener.$anonfun$onSuccess$1(SplineQueryExecutionListener.scala:41) ~[spark-3.3-spline-agent-bundle_2.12-1.0.6.jar:?]
    at za.co.absa.spline.harvester.listener.SplineQueryExecutionListener.$anonfun$onSuccess$1$adapted(SplineQueryExecutionListener.scala:41) ~[spark-3.3-spline-agent-bundle_2.12-1.0.6.jar:?]
    at scala.Option.foreach(Option.scala:257) ~[scala-library.jar:?]
    at za.co.absa.spline.harvester.listener.SplineQueryExecutionListener.onSuccess(SplineQueryExecutionListener.scala:41) ~[spark-3.3-spline-agent-bundle_2.12-1.0.6.jar:?]
    at org.apache.spark.sql.util.ExecutionListenerBus.doPostEvent(QueryExecutionListener.scala:165) ~[spark-sql_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
    at org.apache.spark.sql.util.ExecutionListenerBus.doPostEvent(QueryExecutionListener.scala:135) ~[spark-sql_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
    at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117) ~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
    at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101) ~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
    at org.apache.spark.sql.util.ExecutionListenerBus.postToAll(QueryExecutionListener.scala:135) ~[spark-sql_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
    at org.apache.spark.sql.util.ExecutionListenerBus.onOtherEvent(QueryExecutionListener.scala:147) ~[spark-sql_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
    at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:100) ~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
    at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28) ~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
    at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) ~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
    at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) ~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
    at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117) ~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
    at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101) ~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
    at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105) ~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
    at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105) ~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
    at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:12) ~[scala-library.jar:?]
    at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58) ~[scala-library.jar:?]
    at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100) ~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
    at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96) ~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
    at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1447) ~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
    at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96) ~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
Caused by: scala.tools.reflect.ToolBoxError: reflective compilation has failed: cannot initialize the compiler due to java.lang.NoSuchMethodError: scala.tools.reflect.package$$anon$4.INFO()Lscala/reflect/internal/Reporter$Severity;
    at scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$withCompilerApi$api$.liftedTree1$1(ToolBoxFactory.scala:360) ~[scala-compiler-2.12.15.jar:?]
    at scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$withCompilerApi$api$.compiler$lzycompute(ToolBoxFactory.scala:346) ~[scala-compiler-2.12.15.jar:?]
    at scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$withCompilerApi$api$.compiler(ToolBoxFactory.scala:345) ~[scala-compiler-2.12.15.jar:?]
    at scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$withCompilerApi$.apply(ToolBoxFactory.scala:372) ~[scala-compiler-2.12.15.jar:?]
    at scala.tools.reflect.ToolBoxFactory$ToolBoxImpl.parse(ToolBoxFactory.scala:429) ~[scala-compiler-2.12.15.jar:?]
    at za.co.absa.spline.harvester.json.HarvesterJsonSerDe$.<init>(HarvesterJsonSerDe.scala:49) ~[spark-3.3-spline-agent-bundle_2.12-1.0.6.jar:?]
    at za.co.absa.spline.harvester.json.HarvesterJsonSerDe$.<clinit>(HarvesterJsonSerDe.scala) ~[spark-3.3-spline-agent-bundle_2.12-1.0.6.jar:?]
    ... 71 more
Caused by: java.lang.NoSuchMethodError: scala.tools.reflect.package$$anon$4.INFO()Lscala/reflect/internal/Reporter$Severity;
    at scala.tools.reflect.package$$anon$4.<init>(package.scala:85) ~[scala-compiler-2.12.15.jar:?]
    at scala.tools.reflect.package$.frontEndToReporter(package.scala:77) ~[scala-compiler-2.12.15.jar:?]
    at scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$withCompilerApi$api$.liftedTree1$1(ToolBoxFactory.scala:350) ~[scala-compiler-2.12.15.jar:?]
    at scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$withCompilerApi$api$.compiler$lzycompute(ToolBoxFactory.scala:346) ~[scala-compiler-2.12.15.jar:?]
    at scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$withCompilerApi$api$.compiler(ToolBoxFactory.scala:345) ~[scala-compiler-2.12.15.jar:?]
    at scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$withCompilerApi$.apply(ToolBoxFactory.scala:372) ~[scala-compiler-2.12.15.jar:?]
    at scala.tools.reflect.ToolBoxFactory$ToolBoxImpl.parse(ToolBoxFactory.scala:429) ~[scala-compiler-2.12.15.jar:?]
    at za.co.absa.spline.harvester.json.HarvesterJsonSerDe$.<init>(HarvesterJsonSerDe.scala:49) ~[spark-3.3-spline-agent-bundle_2.12-1.0.6.jar:?]
    at za.co.absa.spline.harvester.json.HarvesterJsonSerDe$.<clinit>(HarvesterJsonSerDe.scala) ~[spark-3.3-spline-agent-bundle_2.12-1.0.6.jar:?]
    ... 71 more
cerveada commented 1 year ago

This is most probably duplicate of #602

imjaleel commented 1 year ago

This is most probably duplicate of #602

Yes @cerveada it's the same issue, I looked for similar issues but somehow missed finding that. Thank you for pointing it out.

However, I'm not able to access the JAR file that's referred to in that issue, can you please help me with the same JAR so that I can build my solution around it until 1.1.0 is released publicly.