apache / kyuubi

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
https://kyuubi.apache.org/
Apache License 2.0
2.06k stars 903 forks source link

[Bug] PySpark job in PENDING state forever #6539

Open sirajulm opened 1 month ago

sirajulm commented 1 month ago

Code of Conduct

Search before asking

Describe the bug

I'm trying to submit a pyspark batch job using the REST endpoint. I can see the spark job to be completed, but Kyuubi shows the job is in pending state.

image

Setup:

Sample pyspark code:

from pyspark.sql import SparkSession

spark = SparkSession.builder \
    .appName("Sample PySpark Code") \
    .getOrCreate()

data = [
    (1, "Alice", 28),
    (2, "Bob", 24),
    (3, "Cathy", 29),
    (4, "David", 30),
    (5, "Eve", 22)
]

columns = ["id", "name", "age"]

df = spark.createDataFrame(data, columns)

# Show the schema and some rows of the DataFrame
df.printSchema()
df.show()

df_selected = df.select("name", "age")

# Filter rows based on a condition
df_filtered = df_selected.filter(df_selected["age"] > 25)

# Show the transformed DataFrame
df_filtered.show()

spark.stop()
curl --location --request POST 'http://localhost:10099/api/v1/batches' \
--form 'batchRequest="{\"batchType\":\"pyspark\",\"name\":\"PySpark Pi App\", \"conf\": {\"spark.master\":\"spark://spark-master:7077\"}}";type=application/json' \
--form 'resourceFile=@"/code/kyuubi/pyspark_sample/main.py"'

Spark job log

Spark Executor Command: "/opt/bitnami/java/bin/java" "-cp" "/opt/bitnami/spark/conf/:/opt/bitnami/spark/jars/*" "-Xmx1024M" "-Dspark.driver.port=44035" "-Djava.net.preferIPv6Addresses=false" "-XX:+IgnoreUnrecognizedVMOptions" "--add-opens=java.base/java.lang=ALL-UNNAMED" "--add-opens=java.base/java.lang.invoke=ALL-UNNAMED" "--add-opens=java.base/java.lang.reflect=ALL-UNNAMED" "--add-opens=java.base/java.io=ALL-UNNAMED" "--add-opens=java.base/java.net=ALL-UNNAMED" "--add-opens=java.base/java.nio=ALL-UNNAMED" "--add-opens=java.base/java.util=ALL-UNNAMED" "--add-opens=java.base/java.util.concurrent=ALL-UNNAMED" "--add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED" "--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED" "--add-opens=java.base/sun.nio.ch=ALL-UNNAMED" "--add-opens=java.base/sun.nio.cs=ALL-UNNAMED" "--add-opens=java.base/sun.security.action=ALL-UNNAMED" "--add-opens=java.base/sun.util.calendar=ALL-UNNAMED" "--add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED" "-Djdk.reflect.useDirectMethodHandle=false" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler@7e5c53b23aa7:44035" "--executor-id" "0" "--hostname" "172.20.0.3" "--cores" "1" "--app-id" "app-20240716160002-0003" "--worker-url" "spark://Worker@172.20.0.3:44167" "--resourceProfileId" "0"
========================================

Picked up _JAVA_OPTIONS: --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED
Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
24/07/16 16:00:03 INFO CoarseGrainedExecutorBackend: Started daemon with process name: 394@73aecc79e3f8
24/07/16 16:00:03 INFO SignalUtils: Registering signal handler for TERM
24/07/16 16:00:03 INFO SignalUtils: Registering signal handler for HUP
24/07/16 16:00:03 INFO SignalUtils: Registering signal handler for INT
24/07/16 16:00:03 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
24/07/16 16:00:03 INFO SecurityManager: Changing view acls to: spark
24/07/16 16:00:03 INFO SecurityManager: Changing modify acls to: spark
24/07/16 16:00:03 INFO SecurityManager: Changing view acls groups to: 
24/07/16 16:00:03 INFO SecurityManager: Changing modify acls groups to: 
24/07/16 16:00:03 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: spark; groups with view permissions: EMPTY; users with modify permissions: spark; groups with modify permissions: EMPTY
24/07/16 16:00:03 INFO TransportClientFactory: Successfully created connection to 7e5c53b23aa7/172.20.0.4:44035 after 35 ms (0 ms spent in bootstraps)
24/07/16 16:00:03 INFO SecurityManager: Changing view acls to: spark
24/07/16 16:00:03 INFO SecurityManager: Changing modify acls to: spark
24/07/16 16:00:03 INFO SecurityManager: Changing view acls groups to: 
24/07/16 16:00:03 INFO SecurityManager: Changing modify acls groups to: 
24/07/16 16:00:03 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: spark; groups with view permissions: EMPTY; users with modify permissions: spark; groups with modify permissions: EMPTY
24/07/16 16:00:03 INFO TransportClientFactory: Successfully created connection to 7e5c53b23aa7/172.20.0.4:44035 after 6 ms (0 ms spent in bootstraps)
24/07/16 16:00:04 INFO DiskBlockManager: Created local directory at /tmp/spark-e30bc5b9-d9f6-4658-be36-98af0238de03/executor-f988152e-b542-4f1d-a816-fc8b29e72ed3/blockmgr-9aad88fe-77ab-487e-a24b-070e0b84e696
24/07/16 16:00:04 INFO MemoryStore: MemoryStore started with capacity 434.4 MiB
24/07/16 16:00:04 INFO WorkerWatcher: Connecting to worker spark://Worker@172.20.0.3:44167
24/07/16 16:00:04 INFO CoarseGrainedExecutorBackend: Connecting to driver: spark://CoarseGrainedScheduler@7e5c53b23aa7:44035
24/07/16 16:00:04 INFO TransportClientFactory: Successfully created connection to /172.20.0.3:44167 after 1 ms (0 ms spent in bootstraps)
24/07/16 16:00:04 INFO ResourceUtils: ==============================================================
24/07/16 16:00:04 INFO ResourceUtils: No custom resources configured for spark.executor.
24/07/16 16:00:04 INFO ResourceUtils: ==============================================================
24/07/16 16:00:04 INFO WorkerWatcher: Successfully connected to spark://Worker@172.20.0.3:44167
24/07/16 16:00:04 INFO CoarseGrainedExecutorBackend: Successfully registered with driver
24/07/16 16:00:04 INFO Executor: Starting executor ID 0 on host 172.20.0.3
24/07/16 16:00:04 INFO Executor: OS info Linux, 5.10.104-linuxkit, aarch64
24/07/16 16:00:04 INFO Executor: Java version 17.0.11
24/07/16 16:00:04 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 38653.
24/07/16 16:00:04 INFO NettyBlockTransferService: Server created on 172.20.0.3:38653
24/07/16 16:00:04 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
24/07/16 16:00:04 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(0, 172.20.0.3, 38653, None)
24/07/16 16:00:04 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(0, 172.20.0.3, 38653, None)
24/07/16 16:00:04 INFO BlockManager: Initialized BlockManager: BlockManagerId(0, 172.20.0.3, 38653, None)
24/07/16 16:00:04 INFO Executor: Starting executor with user classpath (userClassPathFirst = false): ''
24/07/16 16:00:04 INFO Executor: Created or updated repl class loader org.apache.spark.util.MutableURLClassLoader@153e2660 for default.
24/07/16 16:00:04 INFO CoarseGrainedExecutorBackend: Got assigned task 0
24/07/16 16:00:04 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
24/07/16 16:00:04 INFO TorrentBroadcast: Started reading broadcast variable 0 with 1 pieces (estimated total size 4.0 MiB)
24/07/16 16:00:04 INFO TransportClientFactory: Successfully created connection to 7e5c53b23aa7/172.20.0.4:34119 after 1 ms (0 ms spent in bootstraps)
24/07/16 16:00:04 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 6.7 KiB, free 434.4 MiB)
24/07/16 16:00:04 INFO TorrentBroadcast: Reading broadcast variable 0 took 81 ms
24/07/16 16:00:04 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 13.0 KiB, free 434.4 MiB)
24/07/16 16:00:05 INFO CodeGenerator: Code generated in 108.676333 ms
24/07/16 16:00:05 INFO PythonRunner: Times: total = 411, boot = 323, init = 87, finish = 1
24/07/16 16:00:05 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 2004 bytes result sent to driver
24/07/16 16:00:05 INFO CoarseGrainedExecutorBackend: Got assigned task 1
24/07/16 16:00:05 INFO Executor: Running task 0.0 in stage 1.0 (TID 1)
24/07/16 16:00:05 INFO TorrentBroadcast: Started reading broadcast variable 1 with 1 pieces (estimated total size 4.0 MiB)
24/07/16 16:00:05 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 6.7 KiB, free 434.4 MiB)
24/07/16 16:00:05 INFO TorrentBroadcast: Reading broadcast variable 1 took 7 ms
24/07/16 16:00:05 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 13.0 KiB, free 434.4 MiB)
24/07/16 16:00:05 INFO PythonRunner: Times: total = 50, boot = -163, init = 213, finish = 0
24/07/16 16:00:05 INFO Executor: Finished task 0.0 in stage 1.0 (TID 1). 2031 bytes result sent to driver
24/07/16 16:00:06 INFO CoarseGrainedExecutorBackend: Got assigned task 2
24/07/16 16:00:06 INFO Executor: Running task 0.0 in stage 2.0 (TID 2)
24/07/16 16:00:06 INFO TorrentBroadcast: Started reading broadcast variable 2 with 1 pieces (estimated total size 4.0 MiB)
24/07/16 16:00:06 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 6.8 KiB, free 434.4 MiB)
24/07/16 16:00:06 INFO TorrentBroadcast: Reading broadcast variable 2 took 6 ms
24/07/16 16:00:06 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 13.1 KiB, free 434.4 MiB)
24/07/16 16:00:06 INFO CodeGenerator: Code generated in 9.61425 ms
24/07/16 16:00:06 INFO PythonRunner: Times: total = 47, boot = -618, init = 665, finish = 0
24/07/16 16:00:06 INFO Executor: Finished task 0.0 in stage 2.0 (TID 2). 2027 bytes result sent to driver
24/07/16 16:00:06 INFO CoarseGrainedExecutorBackend: Got assigned task 3
24/07/16 16:00:06 INFO Executor: Running task 0.0 in stage 3.0 (TID 3)
24/07/16 16:00:06 INFO TorrentBroadcast: Started reading broadcast variable 3 with 1 pieces (estimated total size 4.0 MiB)
24/07/16 16:00:06 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 6.8 KiB, free 434.4 MiB)
24/07/16 16:00:06 INFO TorrentBroadcast: Reading broadcast variable 3 took 6 ms
24/07/16 16:00:06 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 13.1 KiB, free 434.4 MiB)
24/07/16 16:00:06 INFO PythonRunner: Times: total = 46, boot = -32, init = 78, finish = 0
24/07/16 16:00:06 INFO Executor: Finished task 0.0 in stage 3.0 (TID 3). 1959 bytes result sent to driver
24/07/16 16:00:06 INFO CoarseGrainedExecutorBackend: Driver commanded a shutdown
24/07/16 16:00:06 ERROR CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM

Conclusion From the logs it is clear that the job is being executed as expected. But Kyuubi is not able to find the job status and shows in a forever PENDING state

Affects Version(s)

1.9.1

Kyuubi Server Log Output

No logs available

Kyuubi Engine Log Output

24/07/16 16:00:01 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
24/07/16 16:00:02 INFO SparkContext: Running Spark version 3.5.1
24/07/16 16:00:02 INFO SparkContext: OS info Linux, 5.10.104-linuxkit, aarch64
24/07/16 16:00:02 INFO SparkContext: Java version 17.0.11
24/07/16 16:00:02 INFO ResourceUtils: ==============================================================
24/07/16 16:00:02 INFO ResourceUtils: No custom resources configured for spark.driver.
24/07/16 16:00:02 INFO ResourceUtils: ==============================================================
24/07/16 16:00:02 INFO SparkContext: Submitted application: Sample PySpark Code
24/07/16 16:00:02 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
24/07/16 16:00:02 INFO ResourceProfile: Limiting resource is cpu
24/07/16 16:00:02 INFO ResourceProfileManager: Added ResourceProfile id: 0
24/07/16 16:00:02 INFO SecurityManager: Changing view acls to: spark
24/07/16 16:00:02 INFO SecurityManager: Changing modify acls to: spark
24/07/16 16:00:02 INFO SecurityManager: Changing view acls groups to:
24/07/16 16:00:02 INFO SecurityManager: Changing modify acls groups to:
24/07/16 16:00:02 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: spark; groups with view permissions: EMPTY; users with modify permissions: spark; groups with modify permissions: EMPTY
24/07/16 16:00:02 INFO Utils: Successfully started service 'sparkDriver' on port 44035.
24/07/16 16:00:02 INFO SparkEnv: Registering MapOutputTracker
24/07/16 16:00:02 INFO SparkEnv: Registering BlockManagerMaster
24/07/16 16:00:02 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
24/07/16 16:00:02 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
24/07/16 16:00:02 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
24/07/16 16:00:02 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-d6e03fc3-f910-49de-b842-2fdce0701bd8
24/07/16 16:00:02 INFO MemoryStore: MemoryStore started with capacity 434.4 MiB
24/07/16 16:00:02 INFO SparkEnv: Registering OutputCommitCoordinator
24/07/16 16:00:02 INFO JettyUtils: Start Jetty 0.0.0.0:4040 for SparkUI
24/07/16 16:00:02 INFO Utils: Successfully started service 'SparkUI' on port 4040.
24/07/16 16:00:02 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://spark-master:7077...
24/07/16 16:00:02 INFO TransportClientFactory: Successfully created connection to spark-master/172.20.0.2:7077 after 15 ms (0 ms spent in bootstraps)
24/07/16 16:00:02 INFO StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20240716160002-0003
24/07/16 16:00:02 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20240716160002-0003/0 on worker-20240716154203-172.20.0.3-44167 (172.20.0.3:44167) with 1 core(s)
24/07/16 16:00:02 INFO StandaloneSchedulerBackend: Granted executor ID app-20240716160002-0003/0 on hostPort 172.20.0.3:44167 with 1 core(s), 1024.0 MiB RAM
24/07/16 16:00:02 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 34119.
24/07/16 16:00:02 INFO NettyBlockTransferService: Server created on 7e5c53b23aa7:34119
24/07/16 16:00:02 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
24/07/16 16:00:02 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 7e5c53b23aa7, 34119, None)
24/07/16 16:00:02 INFO BlockManagerMasterEndpoint: Registering block manager 7e5c53b23aa7:34119 with 434.4 MiB RAM, BlockManagerId(driver, 7e5c53b23aa7, 34119, None)
24/07/16 16:00:02 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 7e5c53b23aa7, 34119, None)
24/07/16 16:00:02 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 7e5c53b23aa7, 34119, None)
24/07/16 16:00:02 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20240716160002-0003/0 is now RUNNING
24/07/16 16:00:02 INFO StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
24/07/16 16:00:02 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir.
24/07/16 16:00:02 INFO SharedState: Warehouse path is 'file:/opt/kyuubi/work/anonymous/spark-warehouse'.
root
 |-- id: long (nullable = true)
 |-- name: string (nullable = true)
 |-- age: long (nullable = true)

24/07/16 16:00:04 INFO StandaloneSchedulerBackend$StandaloneDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (172.20.0.3:48738) with ID 0,  ResourceProfileId 0
24/07/16 16:00:04 INFO BlockManagerMasterEndpoint: Registering block manager 172.20.0.3:38653 with 434.4 MiB RAM, BlockManagerId(0, 172.20.0.3, 38653, None)
24/07/16 16:00:04 INFO CodeGenerator: Code generated in 96.000666 ms
24/07/16 16:00:04 INFO SparkContext: Starting job: showString at NativeMethodAccessorImpl.java:0
24/07/16 16:00:04 INFO DAGScheduler: Got job 0 (showString at NativeMethodAccessorImpl.java:0) with 1 output partitions
24/07/16 16:00:04 INFO DAGScheduler: Final stage: ResultStage 0 (showString at NativeMethodAccessorImpl.java:0)
24/07/16 16:00:04 INFO DAGScheduler: Parents of final stage: List()
24/07/16 16:00:04 INFO DAGScheduler: Missing parents: List()
24/07/16 16:00:04 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[6] at showString at NativeMethodAccessorImpl.java:0), which has no missing parents
24/07/16 16:00:04 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 13.0 KiB, free 434.4 MiB)
24/07/16 16:00:04 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 6.7 KiB, free 434.4 MiB)
24/07/16 16:00:04 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 7e5c53b23aa7:34119 (size: 6.7 KiB, free: 434.4 MiB)
24/07/16 16:00:04 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1585
24/07/16 16:00:04 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[6] at showString at NativeMethodAccessorImpl.java:0) (first 15 tasks are for partitions Vector(0))
24/07/16 16:00:04 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks resource profile 0
24/07/16 16:00:04 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0) (172.20.0.3, executor 0, partition 0, PROCESS_LOCAL, 7651 bytes)
24/07/16 16:00:04 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.20.0.3:38653 (size: 6.7 KiB, free: 434.4 MiB)
24/07/16 16:00:05 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 854 ms on 172.20.0.3 (executor 0) (1/1)
24/07/16 16:00:05 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
24/07/16 16:00:05 INFO PythonAccumulatorV2: Connected to AccumulatorServer at host: 127.0.0.1 port: 52719
24/07/16 16:00:05 INFO DAGScheduler: ResultStage 0 (showString at NativeMethodAccessorImpl.java:0) finished in 0.933 s
24/07/16 16:00:05 INFO DAGScheduler: Job 0 is finished. Cancelling potential speculative or zombie tasks for this job
24/07/16 16:00:05 INFO TaskSchedulerImpl: Killing all running tasks in stage 0: Stage finished
24/07/16 16:00:05 INFO DAGScheduler: Job 0 finished: showString at NativeMethodAccessorImpl.java:0, took 0.949888 s
24/07/16 16:00:05 INFO SparkContext: Starting job: showString at NativeMethodAccessorImpl.java:0
24/07/16 16:00:05 INFO DAGScheduler: Got job 1 (showString at NativeMethodAccessorImpl.java:0) with 1 output partitions
24/07/16 16:00:05 INFO DAGScheduler: Final stage: ResultStage 1 (showString at NativeMethodAccessorImpl.java:0)
24/07/16 16:00:05 INFO DAGScheduler: Parents of final stage: List()
24/07/16 16:00:05 INFO DAGScheduler: Missing parents: List()
24/07/16 16:00:05 INFO DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[6] at showString at NativeMethodAccessorImpl.java:0), which has no missing parents
24/07/16 16:00:05 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 13.0 KiB, free 434.4 MiB)
24/07/16 16:00:05 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 6.7 KiB, free 434.4 MiB)
24/07/16 16:00:05 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 7e5c53b23aa7:34119 (size: 6.7 KiB, free: 434.4 MiB)
24/07/16 16:00:05 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1585
24/07/16 16:00:05 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (MapPartitionsRDD[6] at showString at NativeMethodAccessorImpl.java:0) (first 15 tasks are for partitions Vector(1))
24/07/16 16:00:05 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks resource profile 0
24/07/16 16:00:05 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1) (172.20.0.3, executor 0, partition 1, PROCESS_LOCAL, 7690 bytes)
24/07/16 16:00:05 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 172.20.0.3:38653 (size: 6.7 KiB, free: 434.4 MiB)
24/07/16 16:00:05 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 82 ms on 172.20.0.3 (executor 0) (1/1)
24/07/16 16:00:05 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
24/07/16 16:00:05 INFO DAGScheduler: ResultStage 1 (showString at NativeMethodAccessorImpl.java:0) finished in 0.090 s
24/07/16 16:00:05 INFO DAGScheduler: Job 1 is finished. Cancelling potential speculative or zombie tasks for this job
24/07/16 16:00:05 INFO TaskSchedulerImpl: Killing all running tasks in stage 1: Stage finished
24/07/16 16:00:05 INFO DAGScheduler: Job 1 finished: showString at NativeMethodAccessorImpl.java:0, took 0.092564 s
24/07/16 16:00:05 INFO BlockManagerInfo: Removed broadcast_0_piece0 on 7e5c53b23aa7:34119 in memory (size: 6.7 KiB, free: 434.4 MiB)
24/07/16 16:00:05 INFO BlockManagerInfo: Removed broadcast_0_piece0 on 172.20.0.3:38653 in memory (size: 6.7 KiB, free: 434.4 MiB)
24/07/16 16:00:06 INFO BlockManagerInfo: Removed broadcast_1_piece0 on 7e5c53b23aa7:34119 in memory (size: 6.7 KiB, free: 434.4 MiB)
24/07/16 16:00:06 INFO BlockManagerInfo: Removed broadcast_1_piece0 on 172.20.0.3:38653 in memory (size: 6.7 KiB, free: 434.4 MiB)
24/07/16 16:00:06 INFO CodeGenerator: Code generated in 22.422917 ms
+---+-----+---+
| id| name|age|
+---+-----+---+
|  1|Alice| 28|
|  2|  Bob| 24|
|  3|Cathy| 29|
|  4|David| 30|
|  5|  Eve| 22|
+---+-----+---+

24/07/16 16:00:06 INFO CodeGenerator: Code generated in 7.778417 ms
24/07/16 16:00:06 INFO SparkContext: Starting job: showString at NativeMethodAccessorImpl.java:0
24/07/16 16:00:06 INFO DAGScheduler: Got job 2 (showString at NativeMethodAccessorImpl.java:0) with 1 output partitions
24/07/16 16:00:06 INFO DAGScheduler: Final stage: ResultStage 2 (showString at NativeMethodAccessorImpl.java:0)
24/07/16 16:00:06 INFO DAGScheduler: Parents of final stage: List()
24/07/16 16:00:06 INFO DAGScheduler: Missing parents: List()
24/07/16 16:00:06 INFO DAGScheduler: Submitting ResultStage 2 (MapPartitionsRDD[8] at showString at NativeMethodAccessorImpl.java:0), which has no missing parents
24/07/16 16:00:06 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 13.1 KiB, free 434.4 MiB)
24/07/16 16:00:06 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 6.8 KiB, free 434.4 MiB)
24/07/16 16:00:06 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 7e5c53b23aa7:34119 (size: 6.8 KiB, free: 434.4 MiB)
24/07/16 16:00:06 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1585
24/07/16 16:00:06 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 2 (MapPartitionsRDD[8] at showString at NativeMethodAccessorImpl.java:0) (first 15 tasks are for partitions Vector(0))
24/07/16 16:00:06 INFO TaskSchedulerImpl: Adding task set 2.0 with 1 tasks resource profile 0
24/07/16 16:00:06 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID 2) (172.20.0.3, executor 0, partition 0, PROCESS_LOCAL, 7651 bytes)
24/07/16 16:00:06 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 172.20.0.3:38653 (size: 6.8 KiB, free: 434.4 MiB)
24/07/16 16:00:06 INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID 2) in 72 ms on 172.20.0.3 (executor 0) (1/1)
24/07/16 16:00:06 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool
24/07/16 16:00:06 INFO DAGScheduler: ResultStage 2 (showString at NativeMethodAccessorImpl.java:0) finished in 0.078 s
24/07/16 16:00:06 INFO DAGScheduler: Job 2 is finished. Cancelling potential speculative or zombie tasks for this job
24/07/16 16:00:06 INFO TaskSchedulerImpl: Killing all running tasks in stage 2: Stage finished
24/07/16 16:00:06 INFO DAGScheduler: Job 2 finished: showString at NativeMethodAccessorImpl.java:0, took 0.080722 s
24/07/16 16:00:06 INFO SparkContext: Starting job: showString at NativeMethodAccessorImpl.java:0
24/07/16 16:00:06 INFO DAGScheduler: Got job 3 (showString at NativeMethodAccessorImpl.java:0) with 1 output partitions
24/07/16 16:00:06 INFO DAGScheduler: Final stage: ResultStage 3 (showString at NativeMethodAccessorImpl.java:0)
24/07/16 16:00:06 INFO DAGScheduler: Parents of final stage: List()
24/07/16 16:00:06 INFO DAGScheduler: Missing parents: List()
24/07/16 16:00:06 INFO DAGScheduler: Submitting ResultStage 3 (MapPartitionsRDD[8] at showString at NativeMethodAccessorImpl.java:0), which has no missing parents
24/07/16 16:00:06 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 13.1 KiB, free 434.4 MiB)
24/07/16 16:00:06 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 6.8 KiB, free 434.4 MiB)
24/07/16 16:00:06 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on 7e5c53b23aa7:34119 (size: 6.8 KiB, free: 434.4 MiB)
24/07/16 16:00:06 INFO SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:1585
24/07/16 16:00:06 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 3 (MapPartitionsRDD[8] at showString at NativeMethodAccessorImpl.java:0) (first 15 tasks are for partitions Vector(1))
24/07/16 16:00:06 INFO TaskSchedulerImpl: Adding task set 3.0 with 1 tasks resource profile 0
24/07/16 16:00:06 INFO TaskSetManager: Starting task 0.0 in stage 3.0 (TID 3) (172.20.0.3, executor 0, partition 1, PROCESS_LOCAL, 7690 bytes)
24/07/16 16:00:06 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on 172.20.0.3:38653 (size: 6.8 KiB, free: 434.4 MiB)
24/07/16 16:00:06 INFO TaskSetManager: Finished task 0.0 in stage 3.0 (TID 3) in 68 ms on 172.20.0.3 (executor 0) (1/1)
24/07/16 16:00:06 INFO TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool
24/07/16 16:00:06 INFO DAGScheduler: ResultStage 3 (showString at NativeMethodAccessorImpl.java:0) finished in 0.072 s
24/07/16 16:00:06 INFO DAGScheduler: Job 3 is finished. Cancelling potential speculative or zombie tasks for this job
24/07/16 16:00:06 INFO TaskSchedulerImpl: Killing all running tasks in stage 3: Stage finished
24/07/16 16:00:06 INFO DAGScheduler: Job 3 finished: showString at NativeMethodAccessorImpl.java:0, took 0.075610 s
24/07/16 16:00:06 INFO CodeGenerator: Code generated in 5.160208 ms
+-----+---+
| name|age|
+-----+---+
|Alice| 28|
|Cathy| 29|
|David| 30|
+-----+---+

24/07/16 16:00:06 INFO SparkContext: Invoking stop() from shutdown hook
24/07/16 16:00:06 INFO SparkContext: SparkContext is stopping with exitCode 0.
24/07/16 16:00:06 INFO SparkUI: Stopped Spark web UI at http://7e5c53b23aa7:4040
24/07/16 16:00:06 INFO StandaloneSchedulerBackend: Shutting down all executors
24/07/16 16:00:06 INFO StandaloneSchedulerBackend$StandaloneDriverEndpoint: Asking each executor to shut down
24/07/16 16:00:06 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
24/07/16 16:00:06 INFO MemoryStore: MemoryStore cleared
24/07/16 16:00:06 INFO BlockManager: BlockManager stopped
24/07/16 16:00:06 INFO BlockManagerMaster: BlockManagerMaster stopped
24/07/16 16:00:06 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
24/07/16 16:00:06 INFO SparkContext: Successfully stopped SparkContext
24/07/16 16:00:06 INFO ShutdownHookManager: Shutdown hook called
24/07/16 16:00:06 INFO ShutdownHookManager: Deleting directory /tmp/spark-4156bfce-659a-4d64-b1b2-be7bf15a9ea4
24/07/16 16:00:06 INFO ShutdownHookManager: Deleting directory /tmp/spark-c948551e-35b5-4ab8-8dd5-76db70beddf3
24/07/16 16:00:06 INFO ShutdownHookManager: Deleting directory /tmp/spark-c948551e-35b5-4ab8-8dd5-76db70beddf3/pyspark-3f35a35e-7954-4f6c-83b3-ba67744ec2ba

Kyuubi Server Configurations

kyuubi.authentication=NONE
kyuubi.frontend.bind.host=0.0.0.0
kyuubi.frontend.protocols=THRIFT_BINARY,REST
kyuubi.frontend.thrift.binary.bind.port=10009
kyuubi.frontend.rest.bind.port=10099
# kyuubi.ha.addresses=zookeeper:2181
kyuubi.session.engine.idle.timeout=PT5M
kyuubi.operation.incremental.collect=true
kyuubi.operation.progress.enabled=true

kyuubi.metrics.reporters=PROMETHEUS
kyuubi.metrics.prometheus.port=10019
kyuubi.batch.conf.spark.spark.master=spark://spark-master:7077
spark.master=spark://spark-master:7077

Kyuubi Engine Configurations

No response

Additional context

I'm new to Kyuubi and Scala, I would still be happy to support.

Are you willing to submit PR?

github-actions[bot] commented 1 month ago

Hello @sirajulm, Thanks for finding the time to report the issue! We really appreciate the community's efforts to improve Apache Kyuubi.

pan3793 commented 1 month ago

This is because Kyuubi does not implement a Spark Standalone ApplicationOperation yet.

For interactive sessions (this is not your case), Kyuubi submits a custom Spark app that has a built-in RPC Server, so that the Kyuubi server can communicate to the Spark driver directly to know the query status.

For batch Spark jobs(no matter Java/Scala jar or PySpark or R), Kyuubi just assembles the spark-submit command to submit your job to the Resource Manager(Local, Mesos, YARN, K8s, or Standalone), and then uses ApplicationOperation to query the job status. Currently, it only supports Local, YARN, and K8s.

sirajulm commented 1 month ago

@pan3793 Thanks for the quick response. I will try with k8s which would match closely to a production setup for me.