Mellanox / SparkRDMA

This is archive of SparkRDMA project. The new repository with RDMA shuffle acceleration for Apache Spark is here: https://github.com/Nvidia/sparkucx
Apache License 2.0
240 stars 70 forks source link

Errors when using 2 or more nodes #37

Open SmirAlex opened 3 years ago

SmirAlex commented 3 years ago

Hello! I found out that SparkRDMA works correctly on our cluster only when 2 worker nodes interacting between each other. The number of executors on each node doesn't matter (I tried 1, 2, 4 executors per node). When 3 or more nodes take part in task processing, some errors occur (logs below). Job ends successfully, but it takes much more time than without RDMA. How can i solve this problem? In case when only 2 nodes take part in task processing I can see performance of SparkRDMA.

Cluster

4 nodes (all nodes can be workers) Component Description
Storage Samsung SSD 970 EVO Plus 250GB (NVMe)
OS Kubuntu 20.04 LTS
Network switch Huawei S5720 36-C with switching capacity of 598 Gbit/s
Network physical layer single mode optical fiber
Network adapter Mellanox ConnectX-4 Lx EN, 10 GbE single port SFP+
Memory 32 GB
CPU Intel Core i7-9700 CPU @ 3.00GHz, 8 cores with 1 thread per core

Yarn configurations (Hadoop 3.1.3)

Configuration Value
yarn.nodemanager.resource.memory-mb 27648
yarn.scheduler.maximum-allocation-mb 27648
yarn.scheduler.minimum-allocation-mb 13824

Spark configurations (Spark 2.4.0 )

SparkRDMA-3.1

DiSNI-1.7

Configuration Value Description
yarn.executor.num 3 For this test I use 3 nodes (1 executor per node)
yarn.executor.cores 5
spark.executor.memory 7g I use only ~50% of possible memory to prevent OOM errors (very often it can be)
spark.executor.memoryOverhead 8g as suggested here
spark.driver.memory 2g
spark.default.parallelism 45
spark.sql.shuffle.partitions 45
spark.shuffle.manager org.apache.spark.shuffle.rdma.RdmaShuffleManager
spark.shuffle.compress false as suggested here
spark.shuffle.spill.compress false
spark.broadcast.compress false
spark.locality.wait 0

I run tests using HiBench. Dataset profile - huge (about 30 GB). Workload - TeraSort.

Some of Spark logs:

2020-11-24 12:00:02 INFO YarnClientSchedulerBackend:54 - Stopped 2020-11-24 12:00:02 INFO RdmaChannel:874 - Stopping RdmaChannel RdmaChannel(0) 2020-11-24 12:00:02 INFO MapOutputTrackerMasterEndpoint:54 - MapOutputTrackerMasterEndpoint stopped! 2020-11-24 12:00:02 INFO RdmaChannel:874 - Stopping RdmaChannel RdmaChannel(2) 2020-11-24 12:00:02 INFO RdmaChannel:874 - Stopping RdmaChannel RdmaChannel(1) 2020-11-24 12:00:02 INFO disni:257 - disconnect, id 0 2020-11-24 12:00:02 INFO disni:257 - disconnect, id 0 java.lang.NullPointerException at org.apache.spark.shuffle.rdma.RdmaChannel.processRdmaCmEvent(RdmaChannel.java:345) at org.apache.spark.shuffle.rdma.RdmaChannel.stop(RdmaChannel.java:894) at org.apache.spark.shuffle.rdma.RdmaNode.lambda$new$0(RdmaNode.java:203) at java.lang.Thread.run(Thread.java:748) Exception in thread "RdmaNode connection listening thread" java.lang.RuntimeException: Exception in RdmaNode listening thread java.lang.NullPointerException at org.apache.spark.shuffle.rdma.RdmaNode.lambda$new$0(RdmaNode.java:210) at java.lang.Thread.run(Thread.java:748) 2020-11-24 12:00:02 INFO disni:257 - disconnect, id 0 2020-11-24 12:00:02 INFO RdmaBufferManager:218 - Rdma buffers allocation statistics: 2020-11-24 12:00:02 INFO RdmaBufferManager:222 - Pre allocated 0, allocated 770 buffers of size 4 KB 2020-11-24 12:00:02 INFO disni:201 - deallocPd, pd 1 2020-11-24 12:00:02 INFO disni:274 - destroyCmId, id 0 2020-11-24 12:00:02 INFO disni:263 - destroyEventChannel, channel 0 2020-11-24 12:00:02 INFO MemoryStore:54 - MemoryStore cleared 2020-11-24 12:00:02 INFO BlockManager:54 - BlockManager stopped 2020-11-24 12:00:02 INFO BlockManagerMaster:54 - BlockManagerMaster stopped 2020-11-24 12:00:02 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - OutputCommitCoordinator stopped! 2020-11-24 12:00:02 INFO SparkContext:54 - Successfully stopped SparkContext 2020-11-24 12:00:02 INFO ShutdownHookManager:54 - Shutdown hook called 2020-11-24 12:00:02 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-3a2b9379-56c3-4b40-ab40-e92f03a5c591 2020-11-24 12:00:02 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-9e3d837b-5d83-467f-9a12-591ffd0503a8

Yarn logs from one worker

2020-11-24 11:58:51 INFO Executor:54 - Finished task 40.0 in stage 2.0 (TID 355). 1502 bytes result sent to driver 2020-11-24 12:00:02 INFO CoarseGrainedExecutorBackend:54 - Driver commanded a shutdown 2020-11-24 12:00:02 INFO RdmaChannel:874 - Stopping RdmaChannel RdmaChannel(1) 2020-11-24 12:00:02 INFO RdmaChannel:874 - Stopping RdmaChannel RdmaChannel(2) 2020-11-24 12:00:02 INFO RdmaChannel:874 - Stopping RdmaChannel RdmaChannel(0) 2020-11-24 12:00:02 INFO RdmaChannel:874 - Stopping RdmaChannel RdmaChannel(3) 2020-11-24 12:00:02 INFO disni:257 - disconnect, id 0 2020-11-24 12:00:02 INFO disni:257 - disconnect, id 0 2020-11-24 12:00:02 INFO disni:285 - destroyQP, id 0 2020-11-24 12:00:02 INFO disni:214 - destroyCQ, cq 140630589705088 2020-11-24 12:00:02 INFO disni:274 - destroyCmId, id 0 2020-11-24 12:00:02 INFO disni:189 - destroyCompChannel, compChannel 0 2020-11-24 12:00:02 INFO disni:263 - destroyEventChannel, channel 0 2020-11-24 12:00:02 INFO disni:257 - disconnect, id 0 2020-11-24 12:00:02 INFO disni:285 - destroyQP, id 0 2020-11-24 12:00:02 INFO disni:214 - destroyCQ, cq 94745206395344 2020-11-24 12:00:02 INFO disni:274 - destroyCmId, id 0 2020-11-24 12:00:02 INFO disni:189 - destroyCompChannel, compChannel 0 2020-11-24 12:00:02 INFO disni:257 - disconnect, id 0 2020-11-24 12:00:02 INFO CoarseGrainedExecutorBackend:54 - Driver from hadoop-master:45735 disconnected during shutdown 2020-11-24 12:00:02 INFO CoarseGrainedExecutorBackend:54 - Driver from hadoop-master:45735 disconnected during shutdown 2020-11-24 12:00:02 INFO RdmaChannel:874 - Stopping RdmaChannel RdmaChannel(4) 2020-11-24 12:00:02 ERROR RdmaNode:384 - Failed to stop RdmaChannel during 50 ms 2020-11-24 12:00:02 INFO disni:257 - disconnect, id 0 2020-11-24 12:00:02 INFO disni:285 - destroyQP, id 0 2020-11-24 12:00:02 WARN RdmaChannel:897 - Failed to get RDMA_CM_EVENT_DISCONNECTED: getCmEvent() failed 2020-11-24 12:00:02 INFO disni:285 - destroyQP, id 0 2020-11-24 12:00:02 INFO disni:214 - destroyCQ, cq 94745206905120 2020-11-24 12:00:02 INFO disni:274 - destroyCmId, id 0 2020-11-24 12:00:02 INFO disni:189 - destroyCompChannel, compChannel 0 2020-11-24 12:00:02 INFO disni:214 - destroyCQ, cq 94745206402048 2020-11-24 12:00:02 INFO disni:274 - destroyCmId, id 0 2020-11-24 12:00:02 INFO disni:189 - destroyCompChannel, compChannel 0 2020-11-24 12:00:02 INFO disni:263 - destroyEventChannel, channel 0 2020-11-24 12:00:02 WARN RdmaChannel:897 - Failed to get RDMA_CM_EVENT_DISCONNECTED: getCmEvent() failed 2020-11-24 12:00:02 INFO disni:285 - destroyQP, id 0 2020-11-24 12:00:02 INFO disni:214 - destroyCQ, cq 140630593575520 2020-11-24 12:00:02 INFO disni:274 - destroyCmId, id 0 2020-11-24 12:00:02 INFO disni:189 - destroyCompChannel, compChannel 0 2020-11-24 12:00:02 INFO disni:263 - destroyEventChannel, channel 0 2020-11-24 12:00:02 INFO RdmaNode:213 - Exiting RdmaNode Listening Server 2020-11-24 12:00:02 INFO RdmaBufferManager:218 - Rdma buffers allocation statistics: 2020-11-24 12:00:02 INFO RdmaBufferManager:222 - Pre allocated 0, allocated 380 buffers of size 4 KB 2020-11-24 12:00:02 INFO RdmaBufferManager:222 - Pre allocated 0, allocated 87 buffers of size 4096 KB 2020-11-24 12:00:02 INFO RdmaBufferManager:222 - Pre allocated 0, allocated 24 buffers of size 1024 KB 2020-11-24 12:00:02 INFO RdmaBufferManager:222 - Pre allocated 0, allocated 10 buffers of size 2048 KB 2020-11-24 12:00:02 INFO disni:201 - deallocPd, pd 1 2020-11-24 12:00:02 INFO disni:274 - destroyCmId, id 0 2020-11-24 12:00:02 INFO disni:263 - destroyEventChannel, channel 0 2020-11-24 12:00:02 INFO MemoryStore:54 - MemoryStore cleared 2020-11-24 12:00:02 INFO BlockManager:54 - BlockManager stopped 2020-11-24 12:00:02 INFO ShutdownHookManager:54 - Shutdown hook called

Also I can see very big shuffle read time in Spark UI

image

petro-rudenko commented 3 years ago

Hi, this project is archived in favor of SparkUCX. Git it a try!