AbsaOSS / spline-spark-agent

Spline agent for Apache Spark
https://absaoss.github.io/spline/
Apache License 2.0
175 stars 90 forks source link

Error when running spark application with agent-core as queryExecutionListeners in java 17 #782

Open GallegoOpen opened 5 months ago

GallegoOpen commented 5 months ago

Hi,

I have an AWS EMR v6.11 (spark 3.2) where I want to run a spark application via spark-submit, with runtime java 17. My application uses spline agent as a dependency:

        <dependency>
            <groupId>za.co.absa.spline.agent.spark</groupId>
            <artifactId>agent-core_2.12</artifactId>
            <version>1.1.0</version>
        </dependency>

When launching my spark job, I add spark property --conf spark.sql.queryExecutionListeners=za.co.absa.spline.harvester.listener.SplineQueryExecutionListener. The job promptly fails with the following error:

Caused by: java.lang.reflect.InaccessibleObjectException: Unable to make public void sun.net.www.protocol.jar.JarURLConnection$JarURLInputStream.close() throws java.io.IOException accessible: module java.base does not "opens sun.net.www.protocol.jar" to unnamed module @2a693f59
    at java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:354) ~[?:?]
    at java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:297) ~[?:?]
    at java.lang.reflect.Method.checkCanSetAccessible(Method.java:199) ~[?:?]
    at java.lang.reflect.Method.setAccessible(Method.java:193) ~[?:?]
    at scala.reflect.package$.ensureAccessible(package.scala:65) ~[scala-library-2.12.15.jar:?]
    at scala.runtime.ScalaRunTime$.ensureAccessible(ScalaRunTime.scala:162) ~[scala-library-2.12.15.jar:?]
    at za.co.absa.commons.lang.ARM$.reflMethod$Method1(ARM.scala:32) ~[hermes-import-8.4.30-jar-with-dependencies.jar:?]
    at za.co.absa.commons.lang.ARM$.using(ARM.scala:32) ~[hermes-import-8.4.30-jar-with-dependencies.jar:?]
    at za.co.absa.spline.harvester.conf.YAMLConfiguration.<init>(YAMLConfiguration.scala:37) ~[hermes-import-8.4.30-jar-with-dependencies.jar:?]
    at za.co.absa.spline.harvester.conf.StandardSplineConfigurationStack$.defaultConfig(StandardSplineConfigurationStack.scala:44) ~[hermes-import-8.4.30-jar-with-dependencies.jar:?]
    at za.co.absa.spline.harvester.SparkLineageInitializer.createListener(SparkLineageInitializer.scala:123) ~[hermes-import-8.4.30-jar-with-dependencies.jar:?]
    at za.co.absa.spline.harvester.listener.SplineQueryExecutionListener.<init>(SplineQueryExecutionListener.scala:37) ~[hermes-import-8.4.30-jar-with-dependencies.jar:?]
    at jdk.internal.reflect.GeneratedConstructorAccessor47.newInstance(Unknown Source) ~[?:?]
    at jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:?]
    at java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499) ~[?:?]
    at java.lang.reflect.Constructor.newInstance(Constructor.java:480) ~[?:?]
    at org.apache.spark.util.Utils$.$anonfun$loadExtensions$1(Utils.scala:3026) ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]
    at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:293) ~[scala-library-2.12.15.jar:?]
    at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) ~[scala-library-2.12.15.jar:?]
    at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) ~[scala-library-2.12.15.jar:?]
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) ~[scala-library-2.12.15.jar:?]
    at scala.collection.TraversableLike.flatMap(TraversableLike.scala:293) ~[scala-library-2.12.15.jar:?]
    at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:290) ~[scala-library-2.12.15.jar:?]
    at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108) ~[scala-library-2.12.15.jar:?]
    at org.apache.spark.util.Utils$.loadExtensions(Utils.scala:3015) ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]
    at org.apache.spark.sql.util.ExecutionListenerManager.$anonfun$new$2(QueryExecutionListener.scala:90) ~[spark-sql_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) ~[scala-library-2.12.15.jar:?]
    at org.apache.spark.sql.internal.SQLConf$.withExistingConf(SQLConf.scala:161) ~[spark-catalyst_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]
    at org.apache.spark.sql.util.ExecutionListenerManager.$anonfun$new$1(QueryExecutionListener.scala:90) ~[spark-sql_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]
    at org.apache.spark.sql.util.ExecutionListenerManager.$anonfun$new$1$adapted(QueryExecutionListener.scala:88) ~[spark-sql_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]
    at scala.Option.foreach(Option.scala:407) ~[scala-library-2.12.15.jar:?]
    at org.apache.spark.sql.util.ExecutionListenerManager.<init>(QueryExecutionListener.scala:88) ~[spark-sql_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]
    at org.apache.spark.sql.internal.BaseSessionStateBuilder.$anonfun$listenerManager$2(BaseSessionStateBuilder.scala:339) ~[spark-sql_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]
    at scala.Option.getOrElse(Option.scala:189) ~[scala-library-2.12.15.jar:?]
    at org.apache.spark.sql.internal.BaseSessionStateBuilder.listenerManager(BaseSessionStateBuilder.scala:339) ~[spark-sql_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]
    at org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:367) ~[spark-sql_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]
    at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1175) ~[spark-sql_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]
    ... 34 more

Looking for more info, I found a separate issue where another user was having the same error, but I believe it was solved using Java11 as runtime. In the same thread, it was suggested adding a JVM option, but it doesn't seem to be solving the problem, perhaps because the JVM is already up and running when spark extraJavaOptions are parsed?

Is there any way to be able to use the library with java 17 as the JRE?

Thanks in advance