AbsaOSS / spline-spark-agent

Spline agent for Apache Spark
https://absaoss.github.io/spline/
Apache License 2.0
185 stars 95 forks source link

Error when running spark application with agent-core as queryExecutionListeners in java 17 #782

Open GallegoOpen opened 10 months ago

GallegoOpen commented 10 months ago

Hi,

I have an AWS EMR v6.11 (spark 3.2) where I want to run a spark application via spark-submit, with runtime java 17. My application uses spline agent as a dependency:

        <dependency>
            <groupId>za.co.absa.spline.agent.spark</groupId>
            <artifactId>agent-core_2.12</artifactId>
            <version>1.1.0</version>
        </dependency>

When launching my spark job, I add spark property --conf spark.sql.queryExecutionListeners=za.co.absa.spline.harvester.listener.SplineQueryExecutionListener. The job promptly fails with the following error:

Caused by: java.lang.reflect.InaccessibleObjectException: Unable to make public void sun.net.www.protocol.jar.JarURLConnection$JarURLInputStream.close() throws java.io.IOException accessible: module java.base does not "opens sun.net.www.protocol.jar" to unnamed module @2a693f59
    at java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:354) ~[?:?]
    at java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:297) ~[?:?]
    at java.lang.reflect.Method.checkCanSetAccessible(Method.java:199) ~[?:?]
    at java.lang.reflect.Method.setAccessible(Method.java:193) ~[?:?]
    at scala.reflect.package$.ensureAccessible(package.scala:65) ~[scala-library-2.12.15.jar:?]
    at scala.runtime.ScalaRunTime$.ensureAccessible(ScalaRunTime.scala:162) ~[scala-library-2.12.15.jar:?]
    at za.co.absa.commons.lang.ARM$.reflMethod$Method1(ARM.scala:32) ~[hermes-import-8.4.30-jar-with-dependencies.jar:?]
    at za.co.absa.commons.lang.ARM$.using(ARM.scala:32) ~[hermes-import-8.4.30-jar-with-dependencies.jar:?]
    at za.co.absa.spline.harvester.conf.YAMLConfiguration.<init>(YAMLConfiguration.scala:37) ~[hermes-import-8.4.30-jar-with-dependencies.jar:?]
    at za.co.absa.spline.harvester.conf.StandardSplineConfigurationStack$.defaultConfig(StandardSplineConfigurationStack.scala:44) ~[hermes-import-8.4.30-jar-with-dependencies.jar:?]
    at za.co.absa.spline.harvester.SparkLineageInitializer.createListener(SparkLineageInitializer.scala:123) ~[hermes-import-8.4.30-jar-with-dependencies.jar:?]
    at za.co.absa.spline.harvester.listener.SplineQueryExecutionListener.<init>(SplineQueryExecutionListener.scala:37) ~[hermes-import-8.4.30-jar-with-dependencies.jar:?]
    at jdk.internal.reflect.GeneratedConstructorAccessor47.newInstance(Unknown Source) ~[?:?]
    at jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:?]
    at java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499) ~[?:?]
    at java.lang.reflect.Constructor.newInstance(Constructor.java:480) ~[?:?]
    at org.apache.spark.util.Utils$.$anonfun$loadExtensions$1(Utils.scala:3026) ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]
    at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:293) ~[scala-library-2.12.15.jar:?]
    at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) ~[scala-library-2.12.15.jar:?]
    at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) ~[scala-library-2.12.15.jar:?]
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) ~[scala-library-2.12.15.jar:?]
    at scala.collection.TraversableLike.flatMap(TraversableLike.scala:293) ~[scala-library-2.12.15.jar:?]
    at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:290) ~[scala-library-2.12.15.jar:?]
    at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108) ~[scala-library-2.12.15.jar:?]
    at org.apache.spark.util.Utils$.loadExtensions(Utils.scala:3015) ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]
    at org.apache.spark.sql.util.ExecutionListenerManager.$anonfun$new$2(QueryExecutionListener.scala:90) ~[spark-sql_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) ~[scala-library-2.12.15.jar:?]
    at org.apache.spark.sql.internal.SQLConf$.withExistingConf(SQLConf.scala:161) ~[spark-catalyst_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]
    at org.apache.spark.sql.util.ExecutionListenerManager.$anonfun$new$1(QueryExecutionListener.scala:90) ~[spark-sql_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]
    at org.apache.spark.sql.util.ExecutionListenerManager.$anonfun$new$1$adapted(QueryExecutionListener.scala:88) ~[spark-sql_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]
    at scala.Option.foreach(Option.scala:407) ~[scala-library-2.12.15.jar:?]
    at org.apache.spark.sql.util.ExecutionListenerManager.<init>(QueryExecutionListener.scala:88) ~[spark-sql_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]
    at org.apache.spark.sql.internal.BaseSessionStateBuilder.$anonfun$listenerManager$2(BaseSessionStateBuilder.scala:339) ~[spark-sql_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]
    at scala.Option.getOrElse(Option.scala:189) ~[scala-library-2.12.15.jar:?]
    at org.apache.spark.sql.internal.BaseSessionStateBuilder.listenerManager(BaseSessionStateBuilder.scala:339) ~[spark-sql_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]
    at org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:367) ~[spark-sql_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]
    at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1175) ~[spark-sql_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]
    ... 34 more

Looking for more info, I found a separate issue where another user was having the same error, but I believe it was solved using Java11 as runtime. In the same thread, it was suggested adding a JVM option, but it doesn't seem to be solving the problem, perhaps because the JVM is already up and running when spark extraJavaOptions are parsed?

Is there any way to be able to use the library with java 17 as the JRE?

Thanks in advance

dmartinb06 commented 1 month ago

we solve this problem using this spark property:

spark.driver.extraJavaOptions => "--add-opens=java.base/sun.net.www.protocol.jar=ALL-UNNAMED"