apache / kyuubi

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
https://kyuubi.apache.org/
Apache License 2.0
2.11k stars 915 forks source link

[Bug] Error in Executor setup After Permanent UDF Deletion #6812

Open hzxiongyinke opened 6 days ago

hzxiongyinke commented 6 days ago

Code of Conduct

Search before asking

Describe the bug

Hello everyone,

I've encountered an issue with Kyuubi that I'm hoping the community can help with.

I created a permanent UDF in a Kyuubi instance, and later, due to requirement changes, I deleted this UDF through another driver. However, any SQL executed by the previously initiated driver now results in an error, indicating that the UDF cannot be found. Currently, the only solution I have is to restart the Kyuubi engine.

Affects Version(s)

1.8.0

Kyuubi Server Log Output

Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 279.0 failed 4 times, most recent failure: Lost task 0.3 in stage 279.0 (TID 251) (core-xxxx.cn-shanghai.emr.aliyuncs.com executor 29): java.io.FileNotFoundException:  [ErrorMessage]: File not found: .GalaxyResource/bigdata_emr_sh/xxx in bucket xxx
        at com.aliyun.jindodata.api.spec.JdoNativeResult.get(JdoNativeResult.java:54)
        at com.aliyun.jindodata.api.spec.protos.coder.JdolistDirectoryReplyDecoder.decode(JdolistDirectoryReplyDecoder.java:23)
        at com.aliyun.jindodata.api.JindoCommonApis.listDirectory(JindoCommonApis.java:112)
        at com.aliyun.jindodata.call.JindoListCall.execute(JindoListCall.java:65)
        at com.aliyun.jindodata.common.JindoHadoopSystem.listStatus(JindoHadoopSystem.java:665)
        at com.aliyun.jindodata.common.JindoHadoopSystem.listStatus(JindoHadoopSystem.java:60)
        at org.apache.spark.util.Utils$.fetchHcfsFile(Utils.scala:851)
        at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:820)
        at org.apache.spark.util.Utils$.fetchFile(Utils.scala:544)
        at org.apache.spark.executor.Executor.$anonfun$updateDependencies$13(Executor.scala:1010)
        at org.apache.spark.executor.Executor.$anonfun$updateDependencies$13$adapted(Executor.scala:1002)
        at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:985)
        at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149)
        at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
        at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
        at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44)
        at scala.collection.mutable.HashMap.foreach(HashMap.scala:149)
        at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:984)
        at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:1002)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:506)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)

Driver stacktrace:
        at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2673)
        at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2609)
        at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2608)
        at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
        at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2608)
        at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1182)
        at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1182)
        at scala.Option.foreach(Option.scala:407)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1182)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2861)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2803)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2792)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
        at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:952)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2241)
        at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:269)

Kyuubi Engine Log Output

spark executor log error:
24/11/18 16:06:18 INFO [Executor task launch worker for task 0.0 in stage 283.0 (TID 264)] Executor: Running task 0.0 in stage 283.0 (TID 264)
24/11/18 16:06:18 INFO [Executor task launch worker for task 0.0 in stage 283.0 (TID 264)] Executor: Fetching oss://xxx/xxxx/xxx with timestamp 1731900662592
24/11/18 16:06:18 INFO [Executor task launch worker for task 0.0 in stage 283.0 (TID 264)] HadoopLoginUserInfo: TOKEN: YARN_AM_RM_TOKEN
24/11/18 16:06:18 INFO [Executor task launch worker for task 0.0 in stage 283.0 (TID 264)] HadoopLoginUserInfo: User: xxxx, authMethod: SIMPLE, ugi: xxxx (auth:SIMPLE)
24/11/18 16:06:18 INFO [Executor task launch worker for task 0.0 in stage 283.0 (TID 264)] JindoHadoopSystem: Initialized native file system: 
24/11/18 16:06:18 INFO [Executor task launch worker for task 0.0 in stage 283.0 (TID 264)] FsStats: cmd=getFileStatus, src=oss://xxxx/.xxxx/xxx/xxxx, dst=null, size=0, parameter=null, time-in-ms=77, version=6.2.0
24/11/18 16:06:18 INFO [Executor task launch worker for task 0.0 in stage 283.0 (TID 264)] FsStats: cmd=list, src=oss://xxxx/.xxxx/xxxx/xxxx, dst=null, size=0, parameter=null, time-in-ms=26, version=6.2.0
24/11/18 16:06:18 ERROR [Executor task launch worker for task 0.0 in stage 283.0 (TID 264)] Executor: Exception in task 0.0 in stage 283.0 (TID 264)
java.io.FileNotFoundException:  [ErrorMessage]: File not found: .xxxx/xxxx/xxxx in bucket xxxx
    at com.aliyun.jindodata.api.spec.JdoNativeResult.get(JdoNativeResult.java:54) ~[jindo-core-6.2.0.jar:?]
    at com.aliyun.jindodata.api.spec.protos.coder.JdolistDirectoryReplyDecoder.decode(JdolistDirectoryReplyDecoder.java:23) ~[jindo-core-6.2.0.jar:?]
    at com.aliyun.jindodata.api.JindoCommonApis.listDirectory(JindoCommonApis.java:112) ~[jindo-core-6.2.0.jar:?]
    at com.aliyun.jindodata.call.JindoListCall.execute(JindoListCall.java:65) ~[jindo-sdk-6.2.0.jar:?]
    at com.aliyun.jindodata.common.JindoHadoopSystem.listStatus(JindoHadoopSystem.java:665) ~[jindo-sdk-6.2.0.jar:?]
    at com.aliyun.jindodata.common.JindoHadoopSystem.listStatus(JindoHadoopSystem.java:60) ~[jindo-sdk-6.2.0.jar:?]
    at org.apache.spark.util.Utils$.fetchHcfsFile(Utils.scala:851) ~[spark-core_2.12-3.3.1-dw1.2.10.jar:3.3.1-dw1.2.10]
    at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:820) ~[spark-core_2.12-3.3.1-dw1.2.10.jar:3.3.1-dw1.2.10]
    at org.apache.spark.util.Utils$.fetchFile(Utils.scala:544) ~[spark-core_2.12-3.3.1-dw1.2.10.jar:3.3.1-dw1.2.10]
    at org.apache.spark.executor.Executor.$anonfun$updateDependencies$13(Executor.scala:1010) ~[spark-core_2.12-3.3.1-dw1.2.10.jar:3.3.1-dw1.2.10]
    at org.apache.spark.executor.Executor.$anonfun$updateDependencies$13$adapted(Executor.scala:1002) ~[spark-core_2.12-3.3.1-dw1.2.10.jar:3.3.1-dw1.2.10]
    at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:985) ~[scala-library-2.12.15.jar:?]
    at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149) ~[scala-library-2.12.15.jar:?]
    at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237) ~[scala-library-2.12.15.jar:?]
    at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230) ~[scala-library-2.12.15.jar:?]
    at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44) ~[scala-library-2.12.15.jar:?]
    at scala.collection.mutable.HashMap.foreach(HashMap.scala:149) ~[scala-library-2.12.15.jar:?]
    at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:984) ~[scala-library-2.12.15.jar:?]
    at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:1002) ~[spark-core_2.12-3.3.1-dw1.2.10.jar:3.3.1-dw1.2.10]
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:506) ~[spark-core_2.12-3.3.1-dw1.2.10.jar:3.3.1-dw1.2.10]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_392]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_392]
    at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_392]
24/11/18 16:06:18 INFO [dispatcher-Executor] YarnCoarseGrainedExecutorBackend: Got assigned task 265

Kyuubi Server Configurations

No response

Kyuubi Engine Configurations

No response

Additional context

No response

Are you willing to submit PR?

github-actions[bot] commented 6 days ago

Hello @hzxiongyinke, Thanks for finding the time to report the issue! We really appreciate the community's efforts to improve Apache Kyuubi.

hzxiongyinke commented 6 days ago

cc @yaooqinn @pan3793

yaooqinn commented 6 days ago

The failed Spark application didn't even access the missing jar file, did it?