apache / datafusion-comet

Apache DataFusion Comet Spark Accelerator
https://datafusion.apache.org/comet
Apache License 2.0
805 stars 155 forks source link

java.lang.ClassNotFoundException: org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager #864

Open radhikabajaj123 opened 2 months ago

radhikabajaj123 commented 2 months ago

Hello,

I am getting the following exception when running spark-submit:

Exception in thread "main" java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1780) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:67) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:429) at org.apache.spark.executor.YarnCoarseGrainedExecutorBackend$.main(YarnCoarseGrainedExecutorBackend.scala:83) at org.apache.spark.executor.YarnCoarseGrainedExecutorBackend.main(YarnCoarseGrainedExecutorBackend.scala) Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641) at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:520) at java.base/java.lang.Class.forName0(Native Method) at java.base/java.lang.Class.forName(Class.java:467) at org.apache.spark.util.Utils$.classForName(Utils.scala:232) at org.apache.spark.util.Utils$.instantiateSerializerOrShuffleManager(Utils.scala:2770) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:433) at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:320) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.$anonfun$run$7(CoarseGrainedExecutorBackend.scala:478) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:68) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:67) at java.base/java.security.AccessController.doPrivileged(AccessController.java:712) at java.base/javax.security.auth.Subject.doAs(Subject.java:439) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762) ... 4 more

These are the configurations I am using for spark-submit:

--deploy-mode cluster \ --driver-memory 32g \ --executor-memory 128g \ --executor-cores 18 \ --driver-cores 8 \ --num-executors 3 \ --conf spark.sql.extensions=org.apache.comet.CometSparkSessionExtensions \ --conf spark.yarn.populateHadoopClasspath=false \ --conf spark.yarn.archive=$BENCH_HOME/$BENCH_DISTR.tgz \ --jars /root/datafusion-comet/spark/target/comet-spark-spark3.4_2.13-0.2.0-SNAPSHOT-sources.jar,./datafusion-comet/spark/target/comet-spark-spark3.4_2.13-0.2.0-SNAPSHOT-test-sources.jar,./datafusion-comet/spark/target/comet-spark-spark3.4_2.13-0.2.0-SNAPSHOT.jar,./datafusion-comet/spark/target/original-comet-spark-spark3.4_2.13-0.2.0-SNAPSHOT-sources.jar,./datafusion-comet/spark/target/original-comet-spark-spark3.4_2.13-0.2.0-SNAPSHOT.jar \ --conf spark.driver.extraClassPath=/root/datafusion-comet/spark/target/comet-spark-spark3.4_2.13-0.2.0-SNAPSHOT-sources.jar,./datafusion-comet/spark/target/comet-spark-spark3.4_2.13-0.2.0-SNAPSHOT-test-sources.jar,./datafusion-comet/spark/target/comet-spark-spark3.4_2.13-0.2.0-SNAPSHOT.jar,./datafusion-comet/spark/target/original-comet-spark-spark3.4_2.13-0.2.0-SNAPSHOT-sources.jar,./datafusion-comet/spark/target/original-comet-spark-spark3.4_2.13-0.2.0-SNAPSHOT.jar \ --conf spark.executor.extraClassPath=/root/datafusion-comet/spark/target/comet-spark-spark3.4_2.13-0.2.0-SNAPSHOT-sources.jar,./datafusion-comet/spark/target/comet-spark-spark3.4_2.13-0.2.0-SNAPSHOT-test-sources.jar,./datafusion-comet/spark/target/comet-spark-spark3.4_2.13-0.2.0-SNAPSHOT.jar,./datafusion-comet/spark/target/original-comet-spark-spark3.4_2.13-0.2.0-SNAPSHOT-sources.jar,./datafusion-comet/spark/target/original-comet-spark-spark3.4_2.13-0.2.0-SNAPSHOT.jar \ --conf spark.sql.extensions=org.apache.comet.CometSparkSessionExtensions \ --conf spark.comet.enabled=true \ --conf spark.comet.exec.enabled=true \ --conf spark.comet.exec.all.enabled=true \ --conf spark.comet.explainFallback.enabled=true \ --conf spark.comet.cast.allowIncompatible=true \ --conf spark.comet.exec.shuffle.enabled=true \ --conf spark.comet.exec.shuffle.mode=auto \ --conf spark.comet.shuffle.enforceMode.enabled=true \ --conf spark.shuffle.manager=org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager \

Does anyone have any insights as to what might be causing the error?

andygrove commented 2 months ago

I see that you are submitting multiple jars. One is using an absolute path under /root and others are using a relative path, which seems like it is maybe not intended?

Also, there is no need to submit the source jars or test source jars.

Could you try submitting just comet-spark-spark3.4_2.13-0.2.0-SNAPSHOT.jar using an absolute path?

/root/datafusion-comet/spark/target/comet-spark-spark3.4_2.13-0.2.0-SNAPSHOT-sources.jar,
./datafusion-comet/spark/target/comet-spark-spark3.4_2.13-0.2.0-SNAPSHOT-test-sources.jar,
./datafusion-comet/spark/target/comet-spark-spark3.4_2.13-0.2.0-SNAPSHOT.jar,
./datafusion-comet/spark/target/original-comet-spark-spark3.4_2.13-0.2.0-SNAPSHOT-sources.jar,
./datafusion-comet/spark/target/original-comet-spark-spark3.4_2.13-0.2.0-SNAPSHOT.jar \
radhikabajaj123 commented 2 months ago

I had tried submitting just comet-spark-spark3.4_2.13-0.2.0-SNAPSHOT.jar using an absolute path and that had also gave the same error.

nblagodarnyi commented 1 month ago

spark.[driver|executor].extraClassPath should be a semicolon-separated list of local jars with absolute local paths. spark-submit silently ignores errors in this config. That's why spark cannot find mentioned class in its classpath. This example works for me

export JARS_LOCAL="/opt/spark-3.5.1/jars_ext/comet-spark-spark3.5_2.12-0.2.0-SNAPSHOT-210824.jar:/opt/spark-3.5.1/jars_ext/spark-metrics-3.5-1.0.0.jar";
spark-shell \
...
--conf spark.plugins=org.apache.spark.CometPlugin \
--conf spark.driver.extraClassPath=$JARS_LOCAL \
--conf spark.executor.extraClassPath=$JARS_LOCAL
...
radhikabajaj123 commented 1 month ago

Hi Nikita, thanks for the reply!

I am receiving the same error when I try submitting a single jar comet-spark-spark3.4_2.13-0.2.0-SNAPSHOT.jar using an absolute local path.

viirya commented 1 month ago

It doesn't make sense. I also don't think this is related to Comet. Based on what you described, seems you cannot include any third-party classes through --jars config.

Are you able to have any jar other than Comet in --jars and import any class from it?

nblagodarnyi commented 1 month ago

@radhikabajaj123 note that this local jar (with local path) should be present on all worker nodes of your cluster.

xhumanoid commented 1 month ago

@radhikabajaj123

spark.[driver|executor].extraClassPath - is a part which will be added as classpath parameters so have restriction:

  1. have to use paths to libs presented on machines
  2. have to use OS specific delimiters for classpath, on linux is a :

every time when you use spark-submit all libraries from --jars will be loaded to local working dir, so you don't need to provide relative path, possible options:

  1. do file file distribution on cluster and use this paths later with spark.[driver|executor].extraClassPath + absolute local path
  2. remove any spark.[driver|executor].extraClassPath from you spark-submit and include all necessary jars into --jars parameter

for test purpose i recommend to use option 2 but for production better to use option 1, because you can rely on yarn-site.xml config and include this jars into classpath by default