Open radhikabajaj123 opened 2 months ago
@radhikabajaj123 : I recently installed the setup successfully. Can you write down the steps you followed to understand the issue?
Hi @nitin-kalyankar25 ,
I followed the steps on this page: https://datafusion.apache.org/comet/user-guide/installation.html#building-from-source (previously, the same steps worked, but now they're not working).
git clone https://github.com/apache/datafusion-comet.git
make release PROFILES="-Pspark-3.4 -Pscala-2.13"
$COMET_JAR
and $SPARK_HOME
SPARK_HOME/bin/spark-shell \ --jars $COMET_JAR \ --conf spark.driver.extraClassPath=$COMET_JAR \ --conf spark.executor.extraClassPath=$COMET_JAR \ --conf spark.plugins=org.apache.spark.CometPlugin \ --conf spark.comet.enabled=true \ --conf spark.comet.exec.enabled=true \ --conf spark.comet.explainFallback.enabled=true \ --conf spark.driver.memory=1g \ --conf spark.executor.memory=1g
@radhikabajaj123 :
As your error indicates, initialize Spark session. java.lang.ClassNotFoundException: org.apache.spark.CometPlugin
, the CometPlugin
class is not being found. This typically happens when the JAR containing the class is either missing or not properly referenced in your project.
Check the JAR Path:
Ensure that you are passing the correct path to the JAR file and jar should be *-SNAPSHOT.jar.Clean Existing Builds:
Try deleting/cleaning existing JAR if it is partially created.Build with Profiles:
Rebuild your project with the following command
make release PROFILES="-Pspark-3.4 -Pscala-2.13
Below command worked fine with me
export COMET_JAR=<*-SNAPSHOT.jar>
./spark-shell \
--jars $COMET_JAR \
--conf spark.driver.extraClassPath=$COMET_JAR \
--conf spark.executor.extraClassPath=$COMET_JAR \
--conf spark.plugins=org.apache.spark.CometPlugin \
--conf spark.comet.enabled=true \
--conf spark.comet.exec.enabled=true \
--conf spark.comet.explainFallback.enabled=true \
--conf spark.comet.exec.shuffle.mode=jvm \
--conf spark.executor.memory=1g \
--conf spark.shuffle.manager=org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager \
--conf spark.comet.exec.shuffle.enabled=true
@nitin-kalyankar25 Deleting the datafusion-comet project and then repeating the steps from before should Clean Existing Builds (2), right?
@radhikabajaj123 : Not exactly. Try deleting the JAR files from the /spark/target/
directory, where the JAR is created after running the build command.
@nitin-kalyankar25
Hmmm, but when I delete the datafusion-comet project and clone it again, it doesn't contain any /spark/target
directory.
It's when I run make release PROFILES="-Pspark-3.4 -Pscala-2.13"
again, it seems as if it rebuilds the jars and creates the target directory.
The path to the JAR is also correct.
It now gives me this error:
Error: A JNI error has occurred, please check your installation and try again Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) at java.lang.Class.privateGetMethodRecursive(Class.java:3048) at java.lang.Class.getMethod0(Class.java:3018) at java.lang.Class.getMethod(Class.java:1784) at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544) at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 7 more
@radhikabajaj123 : Which Spark shell and Scala version are you using in your Spark setup?
@nitin-kalyankar25 Spark 3.4.3 and Scala 2.13
I faced the same issue when tried to set hdfs locations for extra classpaths like
spark.[driver|executor].extraClassPath=hdfs:///foo/bar/comet.jar
AFAIU it only supports local files and spark-submit silently ignores errors (like missing jars or comma-separated list instead of semicolon-separated) in mentioned config.
Also note that this config overrides settings in conf/spark-defaults. This is a possible cause of issue with FSDataInputStream.
Hello,
I am trying to run the spark shell with comet enabled following the configurations specified at https://datafusion.apache.org/comet/user-guide/installation.html#installing-datafusion-comet, after creating a local build.
I was previously able to launch the spark shell successfully after cloning the Comet project, however, it now gives this exception:
at org.apache.spark.SparkContext.addLocalJarFile$1(SparkContext.scala:1968) at org.apache.spark.SparkContext.addJar(SparkContext.scala:2023) at org.apache.spark.SparkContext.$anonfun$new$12(SparkContext.scala:507) at org.apache.spark.SparkContext.$anonfun$new$12$adapted(SparkContext.scala:507) at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:563) at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:561) at scala.collection.AbstractIterable.foreach(Iterable.scala:926) at org.apache.spark.SparkContext.<init>(SparkContext.scala:507) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2740) at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:1026) at scala.Option.getOrElse(Option.scala:201) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:1020) at org.apache.spark.repl.Main$.createSparkSession(Main.scala:114) at $line3.$read$$iw.<init>(<console>:5) at $line3.$read.<init>(<console>:4) at $line3.$read$.<clinit>(<console>:1) at $line3.$eval$.$print$lzycompute(<synthetic>:6) at $line3.$eval$.$print(<synthetic>:5) at $line3.$eval.$print(<synthetic>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:670) at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1020) at scala.tools.nsc.interpreter.IMain.$anonfun$doInterpret$1(IMain.scala:506) at scala.reflect.internal.util.ScalaClassLoader.asContext(ScalaClassLoader.scala:36) at scala.reflect.internal.util.ScalaClassLoader.asContext$(ScalaClassLoader.scala:116) at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:43) at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:505) at scala.tools.nsc.interpreter.IMain.$anonfun$doInterpret$3(IMain.scala:519) at scala.tools.nsc.interpreter.IMain.doInterpret(IMain.scala:519) at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:503) at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:501) at scala.tools.nsc.interpreter.IMain.$anonfun$quietRun$1(IMain.scala:216) at scala.tools.nsc.interpreter.shell.ReplReporterImpl.withoutPrintingResults(Reporter.scala:64) at scala.tools.nsc.interpreter.IMain.quietRun(IMain.scala:216) at scala.tools.nsc.interpreter.shell.ILoop.$anonfun$interpretPreamble$1(ILoop.scala:924) at scala.collection.immutable.List.foreach(List.scala:333) at scala.tools.nsc.interpreter.shell.ILoop.interpretPreamble(ILoop.scala:924) at scala.tools.nsc.interpreter.shell.ILoop.$anonfun$run$3(ILoop.scala:963) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18) at scala.tools.nsc.interpreter.shell.ILoop.echoOff(ILoop.scala:90) at scala.tools.nsc.interpreter.shell.ILoop.$anonfun$run$2(ILoop.scala:963) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18) at scala.tools.nsc.interpreter.IMain.withSuppressedSettings(IMain.scala:1420) at scala.tools.nsc.interpreter.shell.ILoop.$anonfun$run$1(ILoop.scala:954) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18) at scala.tools.nsc.interpreter.shell.ReplReporterImpl.withoutPrintingResults(Reporter.scala:64) at scala.tools.nsc.interpreter.shell.ILoop.run(ILoop.scala:954) at org.apache.spark.repl.Main$.doMain(Main.scala:84) at org.apache.spark.repl.Main$.main(Main.scala:59) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1020) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1111) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 24/08/26 17:08:39 ERROR SparkContext: Error initializing SparkContext. java.lang.ClassNotFoundException: org.apache.spark.CometPlugin at scala.reflect.internal.util.AbstractFileClassLoader.findClass(AbstractFileClassLoader.scala:75) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.spark.util.Utils$.classForName(Utils.scala:225) at org.apache.spark.util.Utils$.$anonfun$loadExtensions$1(Utils.scala:2946) at scala.collection.StrictOptimizedIterableOps.flatMap(StrictOptimizedIterableOps.scala:118) at scala.collection.StrictOptimizedIterableOps.flatMap$(StrictOptimizedIterableOps.scala:105) at scala.collection.immutable.ArraySeq.flatMap(ArraySeq.scala:35) at org.apache.spark.util.Utils$.loadExtensions(Utils.scala:2944) at org.apache.spark.internal.plugin.PluginContainer$.apply(PluginContainer.scala:207) at org.apache.spark.internal.plugin.PluginContainer$.apply(PluginContainer.scala:193) at org.apache.spark.SparkContext.<init>(SparkContext.scala:565) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2740) at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:1026) at scala.Option.getOrElse(Option.scala:201) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:1020) at org.apache.spark.repl.Main$.createSparkSession(Main.scala:114) at $line3.$read$$iw.<init>(<console>:5) at $line3.$read.<init>(<console>:4) at $line3.$read$.<clinit>(<console>:1) at $line3.$eval$.$print$lzycompute(<synthetic>:6) at $line3.$eval$.$print(<synthetic>:5) at $line3.$eval.$print(<synthetic>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:670) at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1020) at scala.tools.nsc.interpreter.IMain.$anonfun$doInterpret$1(IMain.scala:506) at scala.reflect.internal.util.ScalaClassLoader.asContext(ScalaClassLoader.scala:36) at scala.reflect.internal.util.ScalaClassLoader.asContext$(ScalaClassLoader.scala:116) at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:43) at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:505) at scala.tools.nsc.interpreter.IMain.$anonfun$doInterpret$3(IMain.scala:519) at scala.tools.nsc.interpreter.IMain.doInterpret(IMain.scala:519) at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:503) at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:501) at scala.tools.nsc.interpreter.IMain.$anonfun$quietRun$1(IMain.scala:216) at scala.tools.nsc.interpreter.shell.ReplReporterImpl.withoutPrintingResults(Reporter.scala:64) at scala.tools.nsc.interpreter.IMain.quietRun(IMain.scala:216) at scala.tools.nsc.interpreter.shell.ILoop.$anonfun$interpretPreamble$1(ILoop.scala:924) at scala.collection.immutable.List.foreach(List.scala:333) at scala.tools.nsc.interpreter.shell.ILoop.interpretPreamble(ILoop.scala:924) at scala.tools.nsc.interpreter.shell.ILoop.$anonfun$run$3(ILoop.scala:963) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18) at scala.tools.nsc.interpreter.shell.ILoop.echoOff(ILoop.scala:90) at scala.tools.nsc.interpreter.shell.ILoop.$anonfun$run$2(ILoop.scala:963) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18) at scala.tools.nsc.interpreter.IMain.withSuppressedSettings(IMain.scala:1420) at scala.tools.nsc.interpreter.shell.ILoop.$anonfun$run$1(ILoop.scala:954) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18) at scala.tools.nsc.interpreter.shell.ReplReporterImpl.withoutPrintingResults(Reporter.scala:64) at scala.tools.nsc.interpreter.shell.ILoop.run(ILoop.scala:954) at org.apache.spark.repl.Main$.doMain(Main.scala:84) at org.apache.spark.repl.Main$.main(Main.scala:59) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1020) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1111) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 24/08/26 17:08:39 INFO SparkContext: SparkContext is stopping with exitCode 0. 24/08/26 17:08:39 INFO SparkUI: Stopped Spark web UI at http://n-chafqtlvmeh6ad1j7j7f3.workdaysuv.com:4040 24/08/26 17:08:39 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 24/08/26 17:08:39 INFO MemoryStore: MemoryStore cleared 24/08/26 17:08:39 INFO BlockManager: BlockManager stopped 24/08/26 17:08:39 INFO BlockManagerMaster: BlockManagerMaster stopped 24/08/26 17:08:39 WARN MetricsSystem: Stopping a MetricsSystem that is not running 24/08/26 17:08:39 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 24/08/26 17:08:39 INFO SparkContext: Successfully stopped SparkContext 24/08/26 17:08:39 ERROR Main: Failed to initialize Spark session. java.lang.ClassNotFoundException: org.apache.spark.CometPlugin at scala.reflect.internal.util.AbstractFileClassLoader.findClass(AbstractFileClassLoader.scala:75) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.spark.util.Utils$.classForName(Utils.scala:225) at org.apache.spark.util.Utils$.$anonfun$loadExtensions$1(Utils.scala:2946) at scala.collection.StrictOptimizedIterableOps.flatMap(StrictOptimizedIterableOps.scala:118) at scala.collection.StrictOptimizedIterableOps.flatMap$(StrictOptimizedIterableOps.scala:105) at scala.collection.immutable.ArraySeq.flatMap(ArraySeq.scala:35) at org.apache.spark.util.Utils$.loadExtensions(Utils.scala:2944) at org.apache.spark.internal.plugin.PluginContainer$.apply(PluginContainer.scala:207) at org.apache.spark.internal.plugin.PluginContainer$.apply(PluginContainer.scala:193) at org.apache.spark.SparkContext.<init>(SparkContext.scala:565) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2740) at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:1026) at scala.Option.getOrElse(Option.scala:201) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:1020) at org.apache.spark.repl.Main$.createSparkSession(Main.scala:114) at $line3.$read$$iw.<init>(<console>:5) at $line3.$read.<init>(<console>:4) at $line3.$read$.<clinit>(<console>:1) at $line3.$eval$.$print$lzycompute(<synthetic>:6) at $line3.$eval$.$print(<synthetic>:5) at $line3.$eval.$print(<synthetic>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:670) at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1020) at scala.tools.nsc.interpreter.IMain.$anonfun$doInterpret$1(IMain.scala:506) at scala.reflect.internal.util.ScalaClassLoader.asContext(ScalaClassLoader.scala:36) at scala.reflect.internal.util.ScalaClassLoader.asContext$(ScalaClassLoader.scala:116) at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:43) at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:505) at scala.tools.nsc.interpreter.IMain.$anonfun$doInterpret$3(IMain.scala:519) at scala.tools.nsc.interpreter.IMain.doInterpret(IMain.scala:519) at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:503) at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:501) at scala.tools.nsc.interpreter.IMain.$anonfun$quietRun$1(IMain.scala:216) at scala.tools.nsc.interpreter.shell.ReplReporterImpl.withoutPrintingResults(Reporter.scala:64) at scala.tools.nsc.interpreter.IMain.quietRun(IMain.scala:216) at scala.tools.nsc.interpreter.shell.ILoop.$anonfun$interpretPreamble$1(ILoop.scala:924) at scala.collection.immutable.List.foreach(List.scala:333) at scala.tools.nsc.interpreter.shell.ILoop.interpretPreamble(ILoop.scala:924) at scala.tools.nsc.interpreter.shell.ILoop.$anonfun$run$3(ILoop.scala:963) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18) at scala.tools.nsc.interpreter.shell.ILoop.echoOff(ILoop.scala:90) at scala.tools.nsc.interpreter.shell.ILoop.$anonfun$run$2(ILoop.scala:963) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18) at scala.tools.nsc.interpreter.IMain.withSuppressedSettings(IMain.scala:1420) at scala.tools.nsc.interpreter.shell.ILoop.$anonfun$run$1(ILoop.scala:954) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18) at scala.tools.nsc.interpreter.shell.ReplReporterImpl.withoutPrintingResults(Reporter.scala:64) at scala.tools.nsc.interpreter.shell.ILoop.run(ILoop.scala:954) at org.apache.spark.repl.Main$.doMain(Main.scala:84) at org.apache.spark.repl.Main$.main(Main.scala:59) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1020) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1111) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)