Open Taurus-Le opened 2 years ago
There should be one ticket for an issue, I will create new one for the java.util.NoSuchElementException: None.get
problem.
@Taurus-Le I split the issue in two, this one is for delta and #479 is for es. Feel free to add any relevant info or correct me if I split something wrongly.
Hi @cerveada, sorry for the trouble. I meant to save you some trouble. I did not do the opposite intentionally. Thanks for helping.
Version of apache maven and JDK used to build spline-spark-agent:
[hadoop@h8 spline-spark-agent]$ mvn -version
Apache Maven 3.8.5 (3599d3414f046de2324203b78ddcf9b5e4388aa0)
Maven home: /home/hadoop/SW/apache-maven-3.8.5
Java version: 1.8.0_331, vendor: Oracle Corporation, runtime: /home/hadoop/SW/jdk1.8.0_331/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "3.10.0-1062.el7.x86_64", arch: "amd64", family: "unix"
And here is how I build the spline-spark-agent:
git clone https://github.com/AbsaOSS/spline-spark-agent.git
git checkout release/0.7.10
mvn scala-cross-build:change-version -Pscala-2.12
mvn clean package -Pscala-2.12,spark-3.2 -Dmaven.test.skip=true
It seems to be changes in Delta Lake: 1.2.0 that are causing the issue, the code was tested only on 1.1.0 so if you want to try a workaround try to switch to version 1.1.0
Got it. Thanks. If there's anything I could help, please let me know. And I just found Delta Lake: 2.0.0 has been released.
Hi @cerveada, I've switched Delta Lake:1.2.0 to Delta Lake: 1.1.0. And this time I got the same error as #479.
22/07/22 09:42:46 ERROR SplineQueryExecutionListener: Unexpected error occurred during lineage processing for application: dim_t_device_mapping_test #application_1658293063172_0023
java.util.NoSuchElementException: None.get
at scala.None$.get(Option.scala:529)
at scala.None$.get(Option.scala:527)
at za.co.absa.spline.harvester.postprocessing.ViewAttributeAddingFilter.toAttributeReferencesMap(ViewAttributeAddingFilter.scala:59)
at za.co.absa.spline.harvester.postprocessing.ViewAttributeAddingFilter.$anonfun$addMissingAttributeLinks$1(ViewAttributeAddingFilter.scala:39)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at scala.collection.IterableLike.foreach(IterableLike.scala:74)
at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
at scala.collection.TraversableLike.map(TraversableLike.scala:286)
at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
at scala.collection.AbstractTraversable.map(Traversable.scala:108)
at za.co.absa.spline.harvester.postprocessing.ViewAttributeAddingFilter.addMissingAttributeLinks(ViewAttributeAddingFilter.scala:39)
at za.co.absa.spline.harvester.postprocessing.ViewAttributeAddingFilter.processExecutionPlan(ViewAttributeAddingFilter.scala:34)
at za.co.absa.spline.harvester.postprocessing.PostProcessor.$anonfun$process$4(PostProcessor.scala:38)
at za.co.absa.spline.harvester.postprocessing.PostProcessor.$anonfun$provideCtx$1(PostProcessor.scala:25)
at scala.Function1.$anonfun$andThen$1(Function1.scala:57)
at scala.Function1.$anonfun$andThen$1(Function1.scala:57)
at za.co.absa.spline.harvester.postprocessing.PostProcessor.process(PostProcessor.scala:38)
at za.co.absa.spline.harvester.LineageHarvester.$anonfun$harvest$4(LineageHarvester.scala:110)
at scala.Option.flatMap(Option.scala:271)
at za.co.absa.spline.harvester.LineageHarvester.harvest(LineageHarvester.scala:64)
at za.co.absa.spline.harvester.QueryExecutionEventHandler.onSuccess(QueryExecutionEventHandler.scala:42)
at za.co.absa.spline.harvester.listener.SplineQueryExecutionListener.$anonfun$onSuccess$2(SplineQueryExecutionListener.scala:40)
at za.co.absa.spline.harvester.listener.SplineQueryExecutionListener.$anonfun$onSuccess$2$adapted(SplineQueryExecutionListener.scala:40)
at scala.Option.foreach(Option.scala:407)
at za.co.absa.spline.harvester.listener.SplineQueryExecutionListener.$anonfun$onSuccess$1(SplineQueryExecutionListener.scala:40)
at za.co.absa.spline.harvester.listener.SplineQueryExecutionListener.withErrorHandling(SplineQueryExecutionListener.scala:49)
at za.co.absa.spline.harvester.listener.SplineQueryExecutionListener.onSuccess(SplineQueryExecutionListener.scala:40)
at org.apache.spark.sql.util.ExecutionListenerBus.doPostEvent(QueryExecutionListener.scala:158)
at org.apache.spark.sql.util.ExecutionListenerBus.doPostEvent(QueryExecutionListener.scala:128)
at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117)
at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101)
at org.apache.spark.sql.util.ExecutionListenerBus.postToAll(QueryExecutionListener.scala:128)
at org.apache.spark.sql.util.ExecutionListenerBus.onOtherEvent(QueryExecutionListener.scala:140)
at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:100)
at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117)
at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101)
at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105)
at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105)
at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100)
at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1404)
So, I think the problem might have something to do with the calling of createOrReplaceTempView
in my code which calls createTempViewCommand
that calls CreateViewCommand
internally. According to readme of spline-spark-agent, CreateViewCommand will be ignored.
Hmm, I read the logs again and there seems to be an issue with Spark stoping.
java.lang.reflect.InvocationTargetException
Caused by: java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext.
This looks like Spline is trying to call DeltaTableV2.properties
method, but the spark context is already stoped.
Can you try to not stop the Spark if it will change the outcome?
Hi @cerveada, I did not stop the Spark myself. I never called stop()
on SparkContext or SparkSession. Is it because I'm running spark on yarn? I had a suspicion there might be a connection between Spark stopping and dynamic allocation before. So I disabled dynamic allocation. But it did not help.
And I'm truely sorry I forgot to tell you I got the same error as https://github.com/AbsaOSS/spline-spark-agent/issues/479#issuecomment-1196398326 after using PR: https://github.com/AbsaOSS/spline-spark-agent/pull/481
I don't know it might be.
On YARN do you run in local mode or cluster mode?
Do you still get the following errors?
java.lang.reflect.InvocationTargetException
Caused by: java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext.
You can also try to set higher hadoop.service.shutdown.timeout
Version:
Description: The app is reading from multiple delta lake tables and writing the output of join as a new delta lake table. Spline successfully initialized. But unexpected error occurred.
Error logs: delta2delta-1.log delta2delta-2.log delta2delta-3.log
Error information:
Additional information:
I've modifed the pom.xml under the root of spline-spark-agent and spline-spark-agent/core:
I've disabled spark dynamic allocation in suspision that sparkContext might be shutdown for resource problem. Below is the content of my spark-defaults.conf.
dynamic allocation
spark.shuffle.service.enabled true
spark.dynamicAllocation.enabled true
spark.dynamicAllocation.minExecutors 1
spark.dynamicAllocation.maxExecutors
spark.dynamicAllocation.schedulerBacklogTimeout 5
Lineage
spark.sql.queryExecutionListeners za.co.absa.spline.harvester.listener.SplineQueryExecutionListener spark.spline.mode REQUIRED spark.spline.producer.url http://h8:9095/spline-rest/producer
spark.spline.postProcessingFilter.composite.filters dsPasswordReplace
spark.spline.lineageDispatcher console
spark.spline.lineageDispatcher.console.className za.co.absa.spline.harvester.dispatcher.ConsoleLineageDispatcher
spark.spline.lineageDispatcher.console.stream ERR
spark.spline.lineageDispatcher.http.producer.url http://h8:9095/spline-rest/producer
Kafka dispatcher
spline.lineageDispatcher=kafka
spline.lineageDispatcher.kafka.className=za.co.absa.spline.harvester.dispatcher.KafkaLineageDispatcher
producer configs as defined by kafka (bootstrap.servers, key.serializer, etc) all kafka configs are supported
spline.lineageDispatcher.kafka.producer.bootstrap.servers=h5:9092,h6:9092,h7:9092
spline.lineageDispatcher.kafka.producer.key.serializer=org.apache.kafka.common.serialization.StringSerializer
spline.lineageDispatcher.kafka.producer.value.serializer=org.apache.kafka.common.serialization.StringSerializer
spline.lineageDispatcher.kafka.producer.max.in.flight.requests.per.connection=1
topic name for plans and events
spline.lineageDispatcher.kafka.topic=spline-lineage