AbsaOSS / spline-spark-agent

Spline agent for Apache Spark
https://absaoss.github.io/spline/
Apache License 2.0
183 stars 93 forks source link

java.lang.NoClassDefFoundError for #586

Closed dulangaheshan closed 1 year ago

dulangaheshan commented 1 year ago

I'm trying to do some RnD on spline in my local machine, so when i run spark-submit ./schema.py it is working fine and shows the output.

as per the documentation i run below command

spark-submit \
    --packages za.co.absa.spline.agent.spark:spark-2.4-spline-agent-bundle_2.12:0.5.6 \
    --conf "spark.sql.queryExecutionListeners=za.co.absa.spline.harvester.listener.SplineQueryExecutionListener" \
    --conf "spark.spline.produer.url=http:localhost:9090/producer" \
    ./schema.py

Im getting below error

23/01/27 02:30:05 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/home/d5han/Documents/spark/pyspark-examples-master/spark-warehouse').
23/01/27 02:30:05 INFO SharedState: Warehouse path is 'file:/]path/spark-warehouse'.
Traceback (most recent call last):
  File "/path/./schema.py", line 14, in <module>
    spark = SparkSession.builder.master("local[1]").appName('SparkByExamples.com').getOrCreate()
  File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 233, in getOrCreate
  File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__
  File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 111, in deco
  File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o27.sessionState.
: java.lang.NoClassDefFoundError: org/apache/commons/configuration/Configuration
    at za.co.absa.spline.harvester.conf.DefaultSplineConfigurer$.apply(DefaultSplineConfigurer.scala:62)
    at za.co.absa.spline.harvester.listener.SplineQueryExecutionListener$.za$co$absa$spline$harvester$listener$SplineQueryExecutionListener$$constructEventHandler(SplineQueryExecutionListener.scala:65)
    at za.co.absa.spline.harvester.listener.SplineQueryExecutionListener.<init>(SplineQueryExecutionListener.scala:37)
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
    at org.apache.spark.util.Utils$.$anonfun$loadExtensions$1(Utils.scala:2777)
    at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245)
    at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
    at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
    at scala.collection.TraversableLike.flatMap(TraversableLike.scala:245)
    at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:242)
    at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)
    at org.apache.spark.util.Utils$.loadExtensions(Utils.scala:2766)
    at org.apache.spark.sql.util.ExecutionListenerManager.$anonfun$new$1(QueryExecutionListener.scala:84)
    at org.apache.spark.sql.util.ExecutionListenerManager.$anonfun$new$1$adapted(QueryExecutionListener.scala:83)
    at scala.Option.foreach(Option.scala:407)
    at org.apache.spark.sql.util.ExecutionListenerManager.<init>(QueryExecutionListener.scala:83)
    at org.apache.spark.sql.internal.BaseSessionStateBuilder.$anonfun$listenerManager$2(BaseSessionStateBuilder.scala:319)
    at scala.Option.getOrElse(Option.scala:189)
    at org.apache.spark.sql.internal.BaseSessionStateBuilder.listenerManager(BaseSessionStateBuilder.scala:319)
    at org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:346)
    at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1142)
    at org.apache.spark.sql.SparkSession.$anonfun$sessionState$2(SparkSession.scala:156)
    at scala.Option.getOrElse(Option.scala:189)
    at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:152)
    at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:149)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.ClassNotFoundException: org.apache.commons.configuration.Configuration
    at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476)
    at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:589)
    at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
    ... 40 more

23/01/27 02:30:05 INFO SparkContext: Invoking stop() from shutdown hook
23/01/27 02:30:05 INFO SparkUI: Stopped Spark web UI at http://192.168.8.103:4040
23/01/27 02:30:05 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
23/01/27 02:30:05 INFO MemoryStore: MemoryStore cleared
23/01/27 02:30:05 INFO BlockManager: BlockManager stopped
23/01/27 02:30:05 INFO BlockManagerMaster: BlockManagerMaster stopped
23/01/27 02:30:05 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
23/01/27 02:30:05 INFO SparkContext: Successfully stopped SparkContext
23/01/27 02:30:05 INFO ShutdownHookManager: Shutdown hook called
23/01/27 02:30:05 INFO ShutdownHookManager: Deleting directory /tmp/spark-6f4179ff-dfc8-43bb-92d3-699bd14d3a57/pyspark-b8855d3e-1ef2-442d-bfc8-1a04606fe985
23/01/27 02:30:05 INFO ShutdownHookManager: Deleting directory /tmp/spark-44fff210-b202-4ca9-bf83-b9be10bfdc72
23/01/27 02:30:05 INFO ShutdownHookManager: Deleting directory /tmp/spark-6f4179ff-dfc8-43bb-92d3-699bd14d3a57

my local spark version is - 3.1.1 python version is 3.7.6

wajda commented 1 year ago

I guess you took it from some older article or a blog post on the internet. Spline 0.5 is more than 2-year old version that doesn't support Spark 3.1 (I think it even was released before Spark 3.1). You need to use Spline agent version that is compatible to your Spark version - https://github.com/AbsaOSS/spline-spark-agent#selecting-artifact