Closed holdenk closed 1 month ago
I suspect that the correct fix is a documentation note in the README (maybe + a try/catch in the code to print out a reference to the README) since changing the Spark class loader is not easy (I also tried with user classpath first class loader). If folks agree happy to make a PR.
We could also (maybe?) get at Spark's internal class loader and explicitly use it but that also seems very hack-ey
You can work around this error by instead of using --jars to add the arrow datafusion comet jar to the classpath instead copying it into Spark's jar directory so it will be loaded with the same classloader.
Emmm, it could be a potential solution. But it seems a bit of inconvenient. Per my understanding, it usually requires extra effort to change Spark's jar directory/archive in the production environment.
since changing the Spark class loader is not easy (I also tried with user classpath first class loader)
So this issue occurred regardless the spark.driver.userClassPathFirst
setting being true or false?
You can work around this error by instead of using --jars to add the arrow datafusion comet jar to the classpath instead copying it into Spark's jar directory so it will be loaded with the same classloader.
Emmm, it could be a potential solution. But it seems a bit of inconvenient. Per my understanding, it usually requires extra effort to change Spark's jar directory/archive in the production environment.
True, especially for users of a vendor solution, although for my deployments this isn't a big deal (we package our own Spark version anyways).
Let me take another look next week and see if there is a way to get loaded with Sparks default class loader.
since changing the Spark class loader is not easy (I also tried with user classpath first class loader)
So this issue occurred regardless the
spark.driver.userClassPathFirst
setting being true or false? Yup :(
Now I only tried in vanilla 3.4.
Let me take another look next week and see if there is a way to get loaded with Sparks default class loader.
Thanks for working on this.
Another option came out of my mind would be shading and renaming the package scoped, shuffle related classes, such as org.apache.spark.shuffle.sort.ShuffleInMemorySorter
-> org.apache.comet.shaded.ShuffleInMemorySorter
into Comet's jar. It should be doable and seems very hack-ey too.
Following on I tried adding --driver-class-path
as well and it did the trick. So I think what I would purpose is updating the docs to include --driver-class-path
and maybe adding a try/catch around the part where it has the error and logging a message to indicate the fix. WDYT @advancedxy ?
Ah, I remembered this option. I think it would be great to update the doc to include this option.
One thing more, I think you also need to mention spark.executor.extraClassPath
for Spark on Yarn/K8S deployment's executors?
Also cc @sunchao
Describe the bug
When trying to run using org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager it fails due to class loader isolation.
Steps to reproduce
/home/holden/repos/high-performance-spark-examples/spark-3.4.2-bin-hadoop3/bin/spark-sql --master 'local[5]' --conf spark.eventLog.enabled=true --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,org.apache.comet.CometSparkSessionExtensions --conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog --conf spark.sql.catalog.spark_catalog.type=hive --conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog --conf spark.sql.catalog.local.type=hadoop --conf spark.sql.catalog.local.warehouse=/home/holden/repos/high-performance-spark-examples/warehouse --jars /home/holden/repos/high-performance-spark-examples/accelerators/arrow-datafusion-comet/spark/target/comet-spark-spark3.4_2.12-0.1.0-SNAPSHOT.jar --conf spark.comet.enabled=true --conf spark.comet.exec.enabled=true --conf spark.comet.exec.all.enabled=true --conf spark.shuffle.manager=org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager --conf spark.comet.exec.shuffle.enabled=true --conf spark.comet.columnar.shuffle.enabled=true --conf spark.driver.userClassPathFirst=true --name sql/wap.sql -f sql/wap.sql
I think anything triggering a sort would suffice for repro but just in case my wap.sql here is:
This results in:
Expected behavior
I expect the query to run.
The expected output is:
Additional context
You can work around this error by instead of using --jars to add the arrow datafusion comet jar to the classpath instead copying it into Spark's jar directory so it will be loaded with the same classloader.