Renien / docker-spark-livy

Spark Standalone & Livy
MIT License
12 stars 9 forks source link

pyspark jobs not running on Livy #3

Open ismaelnobregadev opened 1 year ago

ismaelnobregadev commented 1 year ago

Name and Version

docker-spark-livy

What steps will reproduce the bug?

Create pyspark session on Livy with sparkmagic extension or via Curl.

What is the expected behavior?

A Spark session should be created.

What do you see instead?

We've built your image. When we submit commands in a pyspark kernel it crashes the Livy session. The spark kernel is running ok with Scala.

The code failed because of a fatal error: Session 0 unexpectedly reached final status 'error'. See logs: stdout:

stderr: 2023-11-03 19:52:21,930 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2023-11-03 19:52:22,529 INFO driver.RSCDriver: Connecting to: 546f6dce20f7:10000 2023-11-03 19:52:22,530 INFO driver.RSCDriver: Starting RPC server... 2023-11-03 19:52:22,762 INFO rpc.RpcServer: Connected to the port 10001 2023-11-03 19:52:22,762 WARN rsc.RSCConf: Your hostname, 546f6dce20f7, resolves to a loopback address, but we couldn't find any external IP address! 2023-11-03 19:52:22,762 WARN rsc.RSCConf: Set livy.rsc.rpc.server.address if you need to bind to another address. 2023-11-03 19:52:23,416 INFO driver.RSCDriver: Received job request 75d2c76d-ed10-4978-8fd4-8f96a7aa6d86 2023-11-03 19:52:23,417 INFO driver.RSCDriver: SparkContext not yet up, queueing job request. 2023-11-03 19:52:26,826 INFO driver.SparkEntries: Starting Spark context... 2023-11-03 19:52:26,845 INFO spark.SparkContext: Running Spark version 2.4.7 2023-11-03 19:52:26,869 INFO spark.SparkContext: Submitted application: livy-session-0 2023-11-03 19:52:26,920 INFO spark.SecurityManager: Changing view acls to: root 2023-11-03 19:52:26,920 INFO spark.SecurityManager: Changing modify acls to: root 2023-11-03 19:52:26,920 INFO spark.SecurityManager: Changing view acls groups to: 2023-11-03 19:52:26,920 INFO spark.SecurityManager: Changing modify acls groups to: 2023-11-03 19:52:26,920 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set() 2023-11-03 19:52:27,073 INFO util.Utils: Successfully started service 'sparkDriver' on port 34475. 2023-11-03 19:52:27,152 INFO spark.SparkEnv: Registering MapOutputTracker 2023-11-03 19:52:27,199 INFO spark.SparkEnv: Registering BlockManagerMaster 2023-11-03 19:52:27,202 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 2023-11-03 19:52:27,204 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 2023-11-03 19:52:27,237 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-53840d8c-71af-4e27-a934-a94a92befa63 2023-11-03 19:52:27,272 INFO memory.MemoryStore: MemoryStore started with capacity 353.4 MB 2023-11-03 19:52:27,308 INFO spark.SparkEnv: Registering OutputCommitCoordinator 2023-11-03 19:52:27,410 INFO util.log: Logging initialized @6966ms 2023-11-03 19:52:27,479 INFO server.Server: jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown 2023-11-03 19:52:27,499 INFO server.Server: Started @7056ms 2023-11-03 19:52:27,520 INFO server.AbstractConnector: Started ServerConnector@7156c9a1{HTTP/1.1,[http/1.1]}{0.0.0.0:4040} 2023-11-03 19:52:27,520 INFO util.Utils: Successfully started service 'SparkUI' on port 4040. 2023-11-03 19:52:27,570 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2e34e0b1{/jobs,null,AVAILABLE,@Spark} 2023-11-03 19:52:27,571 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1bfc4707{/jobs/json,null,AVAILABLE,@Spark} 2023-11-03 19:52:27,580 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4e070ca1{/jobs/job,null,AVAILABLE,@Spark} 2023-11-03 19:52:27,581 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@64ec5438{/jobs/job/json,null,AVAILABLE,@Spark} 2023-11-03 19:52:27,581 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1d51f3dd{/stages,null,AVAILABLE,@Spark} 2023-11-03 19:52:27,584 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6b4586d2{/stages/json,null,AVAILABLE,@Spark} 2023-11-03 19:52:27,584 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@122f5e2b{/stages/stage,null,AVAILABLE,@Spark} 2023-11-03 19:52:27,585 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@418d369b{/stages/stage/json,null,AVAILABLE,@Spark} 2023-11-03 19:52:27,586 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@293be687{/stages/pool,null,AVAILABLE,@Spark} 2023-11-03 19:52:27,586 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1941bcf7{/stages/pool/json,null,AVAILABLE,@Spark} 2023-11-03 19:52:27,587 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@552809c4{/storage,null,AVAILABLE,@Spark} 2023-11-03 19:52:27,589 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7d39c414{/storage/json,null,AVAILABLE,@Spark} 2023-11-03 19:52:27,589 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@bc00aea{/storage/rdd,null,AVAILABLE,@Spark} 2023-11-03 19:52:27,590 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1ff2a961{/storage/rdd/json,null,AVAILABLE,@Spark} 2023-11-03 19:52:27,590 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@a01f572{/environment,null,AVAILABLE,@Spark} 2023-11-03 19:52:27,591 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@392b8942{/environment/json,null,AVAILABLE,@Spark} 2023-11-03 19:52:27,592 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@32b743d7{/executors,null,AVAILABLE,@Spark} 2023-11-03 19:52:27,592 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@38045d41{/executors/json,null,AVAILABLE,@Spark} 2023-11-03 19:52:27,593 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3dab9556{/executors/threadDump,null,AVAILABLE,@Spark} 2023-11-03 19:52:27,593 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@58af6a38{/executors/threadDump/json,null,AVAILABLE,@Spark} 2023-11-03 19:52:27,600 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@508e3301{/static,null,AVAILABLE,@Spark} 2023-11-03 19:52:27,601 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@44174e4e{/,null,AVAILABLE,@Spark} 2023-11-03 19:52:27,602 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@55395b0a{/api,null,AVAILABLE,@Spark} 2023-11-03 19:52:27,602 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@46ef0919{/jobs/job/kill,null,AVAILABLE,@Spark} 2023-11-03 19:52:27,603 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2f3a3496{/stages/stage/kill,null,AVAILABLE,@Spark} 2023-11-03 19:52:27,605 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://546f6dce20f7:4040/ 2023-11-03 19:52:27,618 INFO spark.SparkContext: Added JAR file:///usr/livy/rsc-jars/livy-api-0.7.0-incubating.jar at spark://546f6dce20f7:34475/jars/livy-api-0.7.0-incubating.jar with timestamp 1699041147618 2023-11-03 19:52:27,618 INFO spark.SparkContext: Added JAR file:///usr/livy/rsc-jars/livy-thriftserver-session-0.7.0-incubating.jar at spark://546f6dce20f7:34475/jars/livy-thriftserver-session-0.7.0-incubating.jar with timestamp 1699041147618 2023-11-03 19:52:27,618 INFO spark.SparkContext: Added JAR file:///usr/livy/rsc-jars/netty-all-4.0.37.Final.jar at spark://546f6dce20f7:34475/jars/netty-all-4.0.37.Final.jar with timestamp 1699041147618 2023-11-03 19:52:27,619 INFO spark.SparkContext: Added JAR file:///usr/livy/rsc-jars/livy-rsc-0.7.0-incubating.jar at spark://546f6dce20f7:34475/jars/livy-rsc-0.7.0-incubating.jar with timestamp 1699041147619 2023-11-03 19:52:27,619 INFO spark.SparkContext: Added JAR file:///usr/livy/repl_2.11-jars/livy-repl_2.11-0.7.0-incubating.jar at spark://546f6dce20f7:34475/jars/livy-repl_2.11-0.7.0-incubating.jar with timestamp 1699041147619 2023-11-03 19:52:27,619 INFO spark.SparkContext: Added JAR file:///usr/livy/repl_2.11-jars/livy-core_2.11-0.7.0-incubating.jar at spark://546f6dce20f7:34475/jars/livy-core_2.11-0.7.0-incubating.jar with timestamp 1699041147619 2023-11-03 19:52:27,619 INFO spark.SparkContext: Added JAR file:///usr/livy/repl_2.11-jars/commons-codec-1.9.jar at spark://546f6dce20f7:34475/jars/commons-codec-1.9.jar with timestamp 1699041147619 2023-11-03 19:52:27,748 INFO client.StandaloneAppClient$ClientEndpoint: Connecting to master spark://master:7077... 2023-11-03 19:52:27,808 INFO client.TransportClientFactory: Successfully created connection to master/172.22.0.2:7077 after 34 ms (0 ms spent in bootstraps) 2023-11-03 19:52:28,067 INFO cluster.StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20231103195228-0000 2023-11-03 19:52:28,099 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 46123. 2023-11-03 19:52:28,100 INFO netty.NettyBlockTransferService: Server created on 546f6dce20f7:46123 2023-11-03 19:52:28,101 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 2023-11-03 19:52:28,116 INFO client.StandaloneAppClient$ClientEndpoint: Executor added: app-20231103195228-0000/0 on worker-20231101181808-172.22.0.3-8881 (172.22.0.3:8881) with 2 core(s) 2023-11-03 19:52:28,117 INFO cluster.StandaloneSchedulerBackend: Granted executor ID app-20231103195228-0000/0 on hostPort 172.22.0.3:8881 with 2 core(s), 1024.0 MB RAM 2023-11-03 19:52:28,139 INFO client.StandaloneAppClient$ClientEndpoint: Executor added: app-20231103195228-0000/1 on worker-20231101181808-172.22.0.3-8881 (172.22.0.3:8881) with 2 core(s) 2023-11-03 19:52:28,140 INFO cluster.StandaloneSchedulerBackend: Granted executor ID app-20231103195228-0000/1 on hostPort 172.22.0.3:8881 with 2 core(s), 1024.0 MB RAM 2023-11-03 19:52:28,142 INFO client.StandaloneAppClient$ClientEndpoint: Executor added: app-20231103195228-0000/2 on worker-20231101181808-172.22.0.3-8881 (172.22.0.3:8881) with 2 core(s) 2023-11-03 19:52:28,145 INFO cluster.StandaloneSchedulerBackend: Granted executor ID app-20231103195228-0000/2 on hostPort 172.22.0.3:8881 with 2 core(s), 1024.0 MB RAM 2023-11-03 19:52:28,160 INFO client.StandaloneAppClient$ClientEndpoint: Executor added: app-20231103195228-0000/3 on worker-20231101181808-172.22.0.3-8881 (172.22.0.3:8881) with 2 core(s) 2023-11-03 19:52:28,160 INFO cluster.StandaloneSchedulerBackend: Granted executor ID app-20231103195228-0000/3 on hostPort 172.22.0.3:8881 with 2 core(s), 1024.0 MB RAM 2023-11-03 19:52:28,160 INFO client.StandaloneAppClient$ClientEndpoint: Executor added: app-20231103195228-0000/4 on worker-20231101181808-172.22.0.3-8881 (172.22.0.3:8881) with 2 core(s) 2023-11-03 19:52:28,161 INFO cluster.StandaloneSchedulerBackend: Granted executor ID app-20231103195228-0000/4 on hostPort 172.22.0.3:8881 with 2 core(s), 1024.0 MB RAM 2023-11-03 19:52:28,222 INFO client.StandaloneAppClient$ClientEndpoint: Executor updated: app-20231103195228-0000/0 is now RUNNING 2023-11-03 19:52:28,232 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 546f6dce20f7, 46123, None) 2023-11-03 19:52:28,232 INFO client.StandaloneAppClient$ClientEndpoint: Executor updated: app-20231103195228-0000/1 is now RUNNING 2023-11-03 19:52:28,242 INFO storage.BlockManagerMasterEndpoint: Registering block manager 546f6dce20f7:46123 with 353.4 MB RAM, BlockManagerId(driver, 546f6dce20f7, 46123, None) 2023-11-03 19:52:28,246 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 546f6dce20f7, 46123, None) 2023-11-03 19:52:28,249 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, 546f6dce20f7, 46123, None) 2023-11-03 19:52:28,256 INFO client.StandaloneAppClient$ClientEndpoint: Executor updated: app-20231103195228-0000/2 is now RUNNING 2023-11-03 19:52:28,260 INFO client.StandaloneAppClient$ClientEndpoint: Executor updated: app-20231103195228-0000/3 is now RUNNING 2023-11-03 19:52:28,287 INFO client.StandaloneAppClient$ClientEndpoint: Executor updated: app-20231103195228-0000/4 is now RUNNING 2023-11-03 19:52:28,340 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@38de00ce{/metrics/json,null,AVAILABLE,@Spark} 2023-11-03 19:52:28,398 INFO cluster.StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0 2023-11-03 19:52:28,437 INFO driver.SparkEntries: Spark context finished initialization in 1610ms 2023-11-03 19:52:28,564 INFO driver.SparkEntries: Created Spark session. Exception in thread "Thread-24" java.lang.NoClassDefFoundError: org/apache/spark/sql/hive/HiveContext at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) at java.lang.Class.privateGetPublicMethods(Class.java:2902) at java.lang.Class.getMethods(Class.java:1615) at py4j.reflection.ReflectionEngine.getMethodsByNameAndLength(ReflectionEngine.java:345) at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:305).

ismaelnobregadev commented 1 year ago

Reopened to fill reproducible steps