When using the Spark optimizer, amoro set spark.executor.userClassPathFirst=true by default. This will cause a class not found error when Spark executes optimization tasks.
Affects Versions
master
What table format are you seeing the problem on?
Iceberg
What engines are you seeing the problem on?
Optimizer
How to reproduce
No response
Relevant log output
Job aborted due to stage failure: Task 0 in stage 29.0 failed 4 times, most recent failure: Lost task 0.3 in stage 29.0 (TID 119) (task-6-5.c-10c76e23f1418d1d.cn-shanghai.emr.aliyuncs.com executor 1): java.lang.NoClassDefFoundError: Could not initialize class org.apache.amoro.shade.thrift.org.apache.thrift.transport.TIOStreamTransport
at org.apache.amoro.api.OptimizingTask.readObject(OptimizingTask.java:485)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1184)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2321)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2212)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1668)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2118)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1656)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2430)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2354)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2212)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1668)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2430)
at java.io.ObjectInputStream.defaultReadObject(ObjectInputStream.java:632)
at org.apache.spark.rdd.ParallelCollectionPartition.$anonfun$readObject$1(ParallelCollectionRDD.scala:73)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1470)
at org.apache.spark.rdd.ParallelCollectionPartition.readObject(ParallelCollectionRDD.scala:69)
at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1184)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2321)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2212)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1668)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2430)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2354)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2212)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1668)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:502)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:460)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:87)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:129)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:507)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Driver stacktrace:
Anything else
No response
Are you willing to submit a PR?
[X] Yes I am willing to submit a PR!
Code of Conduct
[X] I agree to follow this project's Code of Conduct
What happened?
When using the Spark optimizer, amoro set spark.executor.userClassPathFirst=true by default. This will cause a class not found error when Spark executes optimization tasks.
Affects Versions
master
What table format are you seeing the problem on?
Iceberg
What engines are you seeing the problem on?
Optimizer
How to reproduce
No response
Relevant log output
Anything else
No response
Are you willing to submit a PR?
Code of Conduct