Hello!
I found out that SparkRDMA works correctly on our cluster only when 2 worker nodes interacting between each other. The number of executors on each node doesn't matter (I tried 1, 2, 4 executors per node). When 3 or more nodes take part in task processing, some errors occur (logs below). Job ends successfully, but it takes much more time than without RDMA. How can i solve this problem? In case when only 2 nodes take part in task processing I can see performance of SparkRDMA.
Cluster
4 nodes (all nodes can be workers)
Component
Description
Storage
Samsung SSD 970 EVO Plus 250GB (NVMe)
OS
Kubuntu 20.04 LTS
Network switch
Huawei S5720 36-C with switching capacity of 598 Gbit/s
Network physical layer
single mode optical fiber
Network adapter
Mellanox ConnectX-4 Lx EN, 10 GbE single port SFP+
Memory
32 GB
CPU
Intel Core i7-9700 CPU @ 3.00GHz, 8 cores with 1 thread per core
Yarn configurations (Hadoop 3.1.3)
Configuration
Value
yarn.nodemanager.resource.memory-mb
27648
yarn.scheduler.maximum-allocation-mb
27648
yarn.scheduler.minimum-allocation-mb
13824
Spark configurations (Spark 2.4.0 )
SparkRDMA-3.1
DiSNI-1.7
Configuration
Value
Description
yarn.executor.num
3
For this test I use 3 nodes (1 executor per node)
yarn.executor.cores
5
spark.executor.memory
7g
I use only ~50% of possible memory to prevent OOM errors (very often it can be)
Hello! I found out that SparkRDMA works correctly on our cluster only when 2 worker nodes interacting between each other. The number of executors on each node doesn't matter (I tried 1, 2, 4 executors per node). When 3 or more nodes take part in task processing, some errors occur (logs below). Job ends successfully, but it takes much more time than without RDMA. How can i solve this problem? In case when only 2 nodes take part in task processing I can see performance of SparkRDMA.
Cluster
Yarn configurations (Hadoop 3.1.3)
Spark configurations (Spark 2.4.0 )
SparkRDMA-3.1
DiSNI-1.7
I run tests using HiBench. Dataset profile - huge (about 30 GB). Workload - TeraSort.
Some of Spark logs:
2020-11-24 12:00:02 INFO YarnClientSchedulerBackend:54 - Stopped 2020-11-24 12:00:02 INFO RdmaChannel:874 - Stopping RdmaChannel RdmaChannel(0) 2020-11-24 12:00:02 INFO MapOutputTrackerMasterEndpoint:54 - MapOutputTrackerMasterEndpoint stopped! 2020-11-24 12:00:02 INFO RdmaChannel:874 - Stopping RdmaChannel RdmaChannel(2) 2020-11-24 12:00:02 INFO RdmaChannel:874 - Stopping RdmaChannel RdmaChannel(1) 2020-11-24 12:00:02 INFO disni:257 - disconnect, id 0 2020-11-24 12:00:02 INFO disni:257 - disconnect, id 0 java.lang.NullPointerException at org.apache.spark.shuffle.rdma.RdmaChannel.processRdmaCmEvent(RdmaChannel.java:345) at org.apache.spark.shuffle.rdma.RdmaChannel.stop(RdmaChannel.java:894) at org.apache.spark.shuffle.rdma.RdmaNode.lambda$new$0(RdmaNode.java:203) at java.lang.Thread.run(Thread.java:748) Exception in thread "RdmaNode connection listening thread" java.lang.RuntimeException: Exception in RdmaNode listening thread java.lang.NullPointerException at org.apache.spark.shuffle.rdma.RdmaNode.lambda$new$0(RdmaNode.java:210) at java.lang.Thread.run(Thread.java:748) 2020-11-24 12:00:02 INFO disni:257 - disconnect, id 0 2020-11-24 12:00:02 INFO RdmaBufferManager:218 - Rdma buffers allocation statistics: 2020-11-24 12:00:02 INFO RdmaBufferManager:222 - Pre allocated 0, allocated 770 buffers of size 4 KB 2020-11-24 12:00:02 INFO disni:201 - deallocPd, pd 1 2020-11-24 12:00:02 INFO disni:274 - destroyCmId, id 0 2020-11-24 12:00:02 INFO disni:263 - destroyEventChannel, channel 0 2020-11-24 12:00:02 INFO MemoryStore:54 - MemoryStore cleared 2020-11-24 12:00:02 INFO BlockManager:54 - BlockManager stopped 2020-11-24 12:00:02 INFO BlockManagerMaster:54 - BlockManagerMaster stopped 2020-11-24 12:00:02 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - OutputCommitCoordinator stopped! 2020-11-24 12:00:02 INFO SparkContext:54 - Successfully stopped SparkContext 2020-11-24 12:00:02 INFO ShutdownHookManager:54 - Shutdown hook called 2020-11-24 12:00:02 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-3a2b9379-56c3-4b40-ab40-e92f03a5c591 2020-11-24 12:00:02 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-9e3d837b-5d83-467f-9a12-591ffd0503a8
Yarn logs from one worker
2020-11-24 11:58:51 INFO Executor:54 - Finished task 40.0 in stage 2.0 (TID 355). 1502 bytes result sent to driver 2020-11-24 12:00:02 INFO CoarseGrainedExecutorBackend:54 - Driver commanded a shutdown 2020-11-24 12:00:02 INFO RdmaChannel:874 - Stopping RdmaChannel RdmaChannel(1) 2020-11-24 12:00:02 INFO RdmaChannel:874 - Stopping RdmaChannel RdmaChannel(2) 2020-11-24 12:00:02 INFO RdmaChannel:874 - Stopping RdmaChannel RdmaChannel(0) 2020-11-24 12:00:02 INFO RdmaChannel:874 - Stopping RdmaChannel RdmaChannel(3) 2020-11-24 12:00:02 INFO disni:257 - disconnect, id 0 2020-11-24 12:00:02 INFO disni:257 - disconnect, id 0 2020-11-24 12:00:02 INFO disni:285 - destroyQP, id 0 2020-11-24 12:00:02 INFO disni:214 - destroyCQ, cq 140630589705088 2020-11-24 12:00:02 INFO disni:274 - destroyCmId, id 0 2020-11-24 12:00:02 INFO disni:189 - destroyCompChannel, compChannel 0 2020-11-24 12:00:02 INFO disni:263 - destroyEventChannel, channel 0 2020-11-24 12:00:02 INFO disni:257 - disconnect, id 0 2020-11-24 12:00:02 INFO disni:285 - destroyQP, id 0 2020-11-24 12:00:02 INFO disni:214 - destroyCQ, cq 94745206395344 2020-11-24 12:00:02 INFO disni:274 - destroyCmId, id 0 2020-11-24 12:00:02 INFO disni:189 - destroyCompChannel, compChannel 0 2020-11-24 12:00:02 INFO disni:257 - disconnect, id 0 2020-11-24 12:00:02 INFO CoarseGrainedExecutorBackend:54 - Driver from hadoop-master:45735 disconnected during shutdown 2020-11-24 12:00:02 INFO CoarseGrainedExecutorBackend:54 - Driver from hadoop-master:45735 disconnected during shutdown 2020-11-24 12:00:02 INFO RdmaChannel:874 - Stopping RdmaChannel RdmaChannel(4) 2020-11-24 12:00:02 ERROR RdmaNode:384 - Failed to stop RdmaChannel during 50 ms 2020-11-24 12:00:02 INFO disni:257 - disconnect, id 0 2020-11-24 12:00:02 INFO disni:285 - destroyQP, id 0 2020-11-24 12:00:02 WARN RdmaChannel:897 - Failed to get RDMA_CM_EVENT_DISCONNECTED: getCmEvent() failed 2020-11-24 12:00:02 INFO disni:285 - destroyQP, id 0 2020-11-24 12:00:02 INFO disni:214 - destroyCQ, cq 94745206905120 2020-11-24 12:00:02 INFO disni:274 - destroyCmId, id 0 2020-11-24 12:00:02 INFO disni:189 - destroyCompChannel, compChannel 0 2020-11-24 12:00:02 INFO disni:214 - destroyCQ, cq 94745206402048 2020-11-24 12:00:02 INFO disni:274 - destroyCmId, id 0 2020-11-24 12:00:02 INFO disni:189 - destroyCompChannel, compChannel 0 2020-11-24 12:00:02 INFO disni:263 - destroyEventChannel, channel 0 2020-11-24 12:00:02 WARN RdmaChannel:897 - Failed to get RDMA_CM_EVENT_DISCONNECTED: getCmEvent() failed 2020-11-24 12:00:02 INFO disni:285 - destroyQP, id 0 2020-11-24 12:00:02 INFO disni:214 - destroyCQ, cq 140630593575520 2020-11-24 12:00:02 INFO disni:274 - destroyCmId, id 0 2020-11-24 12:00:02 INFO disni:189 - destroyCompChannel, compChannel 0 2020-11-24 12:00:02 INFO disni:263 - destroyEventChannel, channel 0 2020-11-24 12:00:02 INFO RdmaNode:213 - Exiting RdmaNode Listening Server 2020-11-24 12:00:02 INFO RdmaBufferManager:218 - Rdma buffers allocation statistics: 2020-11-24 12:00:02 INFO RdmaBufferManager:222 - Pre allocated 0, allocated 380 buffers of size 4 KB 2020-11-24 12:00:02 INFO RdmaBufferManager:222 - Pre allocated 0, allocated 87 buffers of size 4096 KB 2020-11-24 12:00:02 INFO RdmaBufferManager:222 - Pre allocated 0, allocated 24 buffers of size 1024 KB 2020-11-24 12:00:02 INFO RdmaBufferManager:222 - Pre allocated 0, allocated 10 buffers of size 2048 KB 2020-11-24 12:00:02 INFO disni:201 - deallocPd, pd 1 2020-11-24 12:00:02 INFO disni:274 - destroyCmId, id 0 2020-11-24 12:00:02 INFO disni:263 - destroyEventChannel, channel 0 2020-11-24 12:00:02 INFO MemoryStore:54 - MemoryStore cleared 2020-11-24 12:00:02 INFO BlockManager:54 - BlockManager stopped 2020-11-24 12:00:02 INFO ShutdownHookManager:54 - Shutdown hook called
Also I can see very big shuffle read time in Spark UI