Open gilbertc777 opened 5 years ago
Hi, sure. Would be glad to assist. Can you please tell your setup (number of nodes, hardware type, what workload are you running, spark/hibench config that you are using, etc).
Hi Peter,
I am using Hibench for running Tersort and Wordcont benchmarks.
The setup is as follows:
10 nodes with 36 cores each @ 3Ghz and 384GB ram
100G ROCE over Mellanox CX4 cards
I will upload the configs in a few.
Thanks for the help!
SPARK_MASTER_HOST=hpc-cluster-cn1-rdma
SPARK_MASTER_IP=192.18.18.1
SPARK_LOCAL_DIRS=/localvolume/hadoop/spark/tmp
SPARK_WORKER_DIR=/localvolume/hadoop/spark/worker
SPARK_LOCAL_DIRS=/localvolume/hadoop/spark/local
RDMA_INTERFACE=mlx5_0
--
HiBench spark.conf
# Spark home
hibench.spark.home $SPARK_HOME
# Spark master
# standalone mode: spark://xxx:7077
# YARN mode: yarn-client
hibench.spark.master spark://hpc-cluster-cn1-rdma:7077
#hibench.spark.master yarn
spark.shuffle.compress false
spark.shuffle.spill.compress false
spark.driver.extraClassPath /localvolume/nfs-share/hadoop-rdma/applications/mellanox/spark-2.4.0-bin-hadoop2.7/rdma/spark-rdma-3.1-for-spark-2.4.0-jar-with-dependencies.jar
spark.executor.extraClassPath /localvolume/nfs-share/hadoop-rdma/applications/mellanox/spark-2.4.0-bin-hadoop2.7/rdma/spark-rdma-3.1-for-spark-2.4.0-jar-with-dependencies.jar
spark.shuffle.manager org.apache.spark.shuffle.rdma.RdmaShuffleManager
spark.roce.enabled false
spark.ib.enabled false
spark.shuffle.rdma.device.name mlx5_0
spark.shuffle.rdma.pconn.enabled true
spark.broadcast.compress false spark.broadcast.checksum false
spark.locality.wait 0
#spark.shuffle.rdma.chunk.size 2097152 #spark.shuffle.rdma.memory.slab.size 134217728 # executor number and cores when running on Yarn hibench.yarn.executor.num 2
hibench.yarn.executor.cores 2
# executor and driver memory in standalone & YARN mode
spark.executor.memory 4g
spark.driver.memory 4g
--
[opc@hpc-cluster-cn1-rdma conf]$ cat hadoop.conf
# Hadoop home
hibench.hadoop.home $HADOOP_HOME # The path of hadoop executable
hibench.hadoop.executable ${hibench.hadoop.home}/bin/hadoop
# Hadoop configraution directory
hibench.hadoop.configure.dir ${hibench.hadoop.home}/etc/hadoop # The root HDFS path to store HiBench data hibench.hdfs.master hdfs://hpc-cluster-cn1-rdma:9000
# Hadoop release provider. Supported value: apache, cdh5, hdp
hibench.hadoop.release apache
--
Top of the hibench.conf
[opc@hpc-cluster-cn1-rdma conf]$ cat hibench.conf
# Data scale profile. Available value is tiny, small, large, huge, gigantic and bigdata. # The definition of these profiles can be found in the workload's conf file i.e. conf/workloads/micro/wordcount.conf
hibench.scale.profile bigdata
# Mapper number in hadoop, partition number in Spark
hibench.default.map.parallelism 1000
# Reducer nubmer in hadoop, shuffle partition number in Spark
hibench.default.shuffle.parallelism 15000
@gilbertc777 thanks, what's the result you get for TCP and RDMA?
ScalaSparkTerasort 2019-02-12 23:58:52 32000000000 112.184 285245667 28524566
ScalaSparkTerasort 2019-02-13 00:02:02 32000000000 122.759 260673351 26067335
The first test is running TCP, the secons is running with RDMA enabled. Note RDMA runtime is higher.
I see.
Hi,I have the same problem.The configrure is almost same. The following is part of benchlog. Can you help me
19/07/11 14:55:11 INFO cluster.YarnClientSchedulerBackend: Stopped 19/07/11 14:55:11 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 19/07/11 14:55:11 INFO rdma.RdmaChannel: Stopping RdmaChannel RdmaChannel(11) 19/07/11 14:55:11 INFO rdma.RdmaChannel: Stopping RdmaChannel RdmaChannel(14) 19/07/11 14:55:11 INFO rdma.RdmaChannel: Stopping RdmaChannel RdmaChannel(3) 19/07/11 14:55:11 INFO rdma.RdmaChannel: Stopping RdmaChannel RdmaChannel(12) 19/07/11 14:55:11 INFO rdma.RdmaChannel: Stopping RdmaChannel RdmaChannel(7) 19/07/11 14:55:11 INFO rdma.RdmaChannel: Stopping RdmaChannel RdmaChannel(6) 19/07/11 14:55:11 INFO rdma.RdmaChannel: Stopping RdmaChannel RdmaChannel(16) 19/07/11 14:55:11 INFO rdma.RdmaChannel: Stopping RdmaChannel RdmaChannel(8) 19/07/11 14:55:11 INFO rdma.RdmaChannel: Stopping RdmaChannel RdmaChannel(1) 19/07/11 14:55:11 INFO rdma.RdmaChannel: Stopping RdmaChannel RdmaChannel(15) 19/07/11 14:55:11 INFO rdma.RdmaChannel: Stopping RdmaChannel RdmaChannel(13) 19/07/11 14:55:11 INFO rdma.RdmaChannel: Stopping RdmaChannel RdmaChannel(5) 19/07/11 14:55:11 INFO rdma.RdmaChannel: Stopping RdmaChannel RdmaChannel(2) 19/07/11 14:55:11 INFO rdma.RdmaChannel: Stopping RdmaChannel RdmaChannel(4) 19/07/11 14:55:11 INFO rdma.RdmaChannel: Stopping RdmaChannel RdmaChannel(0) 19/07/11 14:55:11 INFO rdma.RdmaChannel: Stopping RdmaChannel RdmaChannel(9) 19/07/11 14:55:11 INFO rdma.RdmaChannel: Stopping RdmaChannel RdmaChannel(17) 19/07/11 14:55:11 INFO rdma.RdmaChannel: Stopping RdmaChannel RdmaChannel(10) 19/07/11 14:55:11 INFO ibm.disni: disconnect, id 0 19/07/11 14:55:11 INFO ibm.disni: disconnect, id 0 19/07/11 14:55:11 INFO ibm.disni: disconnect, id 0 19/07/11 14:55:11 INFO ibm.disni: disconnect, id 0 java.lang.NullPointerException at org.apache.spark.shuffle.rdma.RdmaChannel.processRdmaCmEvent(RdmaChannel.java:345) at org.apache.spark.shuffle.rdma.RdmaChannel.stop(RdmaChannel.java:894) at org.apache.spark.shuffle.rdma.RdmaNode.lambda$new$0(RdmaNode.java:203) at java.lang.Thread.run(Thread.java:748) Exception in thread "RdmaNode connection listening thread" java.lang.RuntimeException: Exception in RdmaNode listening thread java.lang.NullPointerException at org.apache.spark.shuffle.rdma.RdmaNode.lambda$new$0(RdmaNode.java:210) at java.lang.Thread.run(Thread.java:748) 19/07/11 14:55:11 INFO ibm.disni: disconnect, id 0 19/07/11 14:55:11 INFO ibm.disni: disconnect, id 0 19/07/11 14:55:11 INFO ibm.disni: disconnect, id 0 19/07/11 14:55:11 INFO ibm.disni: disconnect, id 0 19/07/11 14:55:11 INFO ibm.disni: disconnect, id 0 19/07/11 14:55:11 INFO ibm.disni: disconnect, id 0 19/07/11 14:55:11 INFO ibm.disni: disconnect, id 0 19/07/11 14:55:11 INFO ibm.disni: disconnect, id 0 19/07/11 14:55:11 INFO ibm.disni: disconnect, id 0 19/07/11 14:55:11 INFO ibm.disni: disconnect, id 0 19/07/11 14:55:11 INFO ibm.disni: disconnect, id 0 19/07/11 14:55:11 INFO ibm.disni: disconnect, id 0 19/07/11 14:55:11 INFO ibm.disni: disconnect, id 0 19/07/11 14:55:11 INFO ibm.disni: disconnect, id 0 19/07/11 14:55:11 INFO rdma.RdmaBufferManager: Rdma buffers allocation statistics: 19/07/11 14:55:11 INFO rdma.RdmaBufferManager: Pre allocated 0, allocated 4611 buffers of size 4 KB 19/07/11 14:55:11 INFO ibm.disni: deallocPd, pd 1 19/07/11 14:55:11 INFO ibm.disni: destroyCmId, id 0 19/07/11 14:55:11 INFO ibm.disni: destroyEventChannel, channel 0 19/07/11 14:55:11 INFO memory.MemoryStore: MemoryStore cleared 19/07/11 14:55:11 INFO storage.BlockManager: BlockManager stopped 19/07/11 14:55:11 INFO storage.BlockManagerMaster: BlockManagerMaster stopped 19/07/11 14:55:11 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 19/07/11 14:55:11 INFO spark.SparkContext: Successfully stopped SparkContext 19/07/11 14:55:11 INFO util.ShutdownHookManager: Shutdown hook called
some of the yarn log: 19/07/11 16:19:43 INFO rdma.RdmaNode: Starting RdmaNode Listening Server, listening on: /8.8.8.4:50189 19/07/11 16:19:43 INFO rdma.RdmaNode: Established connection to /8.8.8.4:60996 in 253 ms 19/07/11 16:19:43 INFO rdma.RdmaNode: Established connection to /8.8.8.4:56128 in 153 ms 19/07/11 16:19:43 INFO rdma.RdmaNode: Established connection to /8.8.8.4:43667 in 153 ms 19/07/11 16:19:43 INFO rdma.RdmaNode: Established connection to /8.8.8.4:49283 in 161 ms 19/07/11 16:19:43 INFO rdma.RdmaNode: Established connection to /8.8.8.4:48335 in 154 ms 19/07/11 16:19:43 INFO rdma.RdmaNode: Established connection to /8.8.8.4:35004 in 154 ms 19/07/11 16:19:43 INFO rdma.RdmaNode: Established connection to /8.8.8.5:53523 in 156 ms 19/07/11 16:19:43 INFO rdma.RdmaNode: Established connection to /8.8.8.4:57867 in 172 ms 19/07/11 16:19:43 INFO rdma.RdmaNode: Established connection to /8.8.8.4:35765 in 181 ms 19/07/11 16:19:45 INFO rdma.RdmaShuffleManager: RDMA write map task output for mapId: 68 to driver took 6 ms 19/07/11 16:19:45 INFO rdma.RdmaShuffleManager: RDMA write map task output for mapId: 16 to driver took 0 ms 19/07/11 16:19:45 INFO rdma.RdmaShuffleManager: RDMA write map task output for mapId: 49 to driver took 0 ms 19/07/11 16:19:45 INFO rdma.RdmaShuffleManager: RDMA write map task output for mapId: 32 to driver took 1 ms 19/07/11 16:19:45 INFO rdma.RdmaShuffleManager: RDMA write map task output for mapId: 87 to driver took 2 ms 19/07/11 16:19:46 INFO rdma.RdmaShuffleManager: RDMA write map task output for mapId: 35 to driver took 0 ms 19/07/11 16:19:46 INFO rdma.RdmaShuffleManager: RDMA write map task output for mapId: 47 to driver took 1 ms 19/07/11 16:19:46 INFO rdma.RdmaShuffleManager: RDMA write map task output for mapId: 26 to driver took 0 ms 19/07/11 16:19:46 INFO rdma.RdmaShuffleManager: RDMA write map task output for mapId: 61 to driver took 0 ms 19/07/11 16:19:46 INFO rdma.RdmaShuffleManager: RDMA write map task output for mapId: 53 to driver took 1 ms 19/07/11 16:19:47 INFO rdma.RdmaShuffleManager: RDMA write map task output for mapId: 98 to driver took 1 ms 19/07/11 16:19:47 INFO rdma.RdmaShuffleManager: RDMA write map task output for mapId: 108 to driver took 1 ms 19/07/11 16:19:47 INFO rdma.RdmaShuffleManager: RDMA write map task output for mapId: 112 to driver took 0 ms 19/07/11 16:19:48 INFO rdma.RdmaShuffleManager: Getting MapTaskOutput table for shuffleId: 0 of size 4096 from driver 19/07/11 16:19:48 INFO rdma.RdmaShuffleManager: RDMA read mapTaskOutput table for shuffleId: 0 took 3 ms 19/07/11 16:19:57 INFO rdma.RdmaChannel: Stopping RdmaChannel RdmaChannel(7) 19/07/11 16:19:57 INFO rdma.RdmaChannel: Stopping RdmaChannel RdmaChannel(6) 19/07/11 16:19:57 INFO rdma.RdmaChannel: Stopping RdmaChannel RdmaChannel(0) 19/07/11 16:19:57 INFO rdma.RdmaChannel: Stopping RdmaChannel RdmaChannel(8) 19/07/11 16:19:57 INFO rdma.RdmaChannel: Stopping RdmaChannel RdmaChannel(2) 19/07/11 16:19:57 INFO rdma.RdmaChannel: Stopping RdmaChannel RdmaChannel(3) 19/07/11 16:19:57 INFO rdma.RdmaChannel: Stopping RdmaChannel RdmaChannel(9) 19/07/11 16:19:57 INFO rdma.RdmaChannel: Stopping RdmaChannel RdmaChannel(1) 19/07/11 16:19:57 INFO rdma.RdmaChannel: Stopping RdmaChannel RdmaChannel(5) 19/07/11 16:19:57 INFO rdma.RdmaChannel: Stopping RdmaChannel RdmaChannel(12) 19/07/11 16:19:57 INFO rdma.RdmaChannel: Stopping RdmaChannel RdmaChannel(4) 19/07/11 16:19:57 ERROR rdma.RdmaNode: Failed to stop RdmaChannel during 50 ms 19/07/11 16:19:57 WARN rdma.RdmaChannel: Failed to get RDMA_CM_EVENT_DISCONNECTED: getCmEvent() failed 19/07/11 16:19:57 WARN rdma.RdmaChannel: Failed to get RDMA_CM_EVENT_DISCONNECTED: getCmEvent() failed 19/07/11 16:19:57 WARN rdma.RdmaChannel: Failed to get RDMA_CM_EVENT_DISCONNECTED: getCmEvent() failed 19/07/11 16:19:57 WARN rdma.RdmaChannel: Failed to get RDMA_CM_EVENT_DISCONNECTED: getCmEvent() failed 19/07/11 16:19:57 WARN rdma.RdmaChannel: Failed to get RDMA_CM_EVENT_DISCONNECTED: getCmEvent() failed 19/07/11 16:19:57 WARN rdma.RdmaChannel: Failed to get RDMA_CM_EVENT_DISCONNECTED: getCmEvent() failed 19/07/11 16:19:57 WARN rdma.RdmaChannel: Failed to get RDMA_CM_EVENT_DISCONNECTED: getCmEvent() failed 19/07/11 16:19:57 WARN rdma.RdmaChannel: Failed to get RDMA_CM_EVENT_DISCONNECTED: getCmEvent() failed 19/07/11 16:19:57 INFO rdma.RdmaChannel: Stopping RdmaChannel RdmaChannel(14) 19/07/11 16:19:57 ERROR rdma.RdmaNode: Failed to stop RdmaChannel during 50 ms 19/07/11 16:19:58 INFO rdma.RdmaNode: Exiting RdmaNode Listening Server 19/07/11 16:19:58 INFO rdma.RdmaChannel: Stopping RdmaChannel RdmaChannel(16) 19/07/11 16:19:58 INFO rdma.RdmaChannel: Stopping RdmaChannel RdmaChannel(13) 19/07/11 16:19:58 INFO rdma.RdmaChannel: Stopping RdmaChannel RdmaChannel(15) 19/07/11 16:19:58 INFO rdma.RdmaChannel: Stopping RdmaChannel RdmaChannel(11) 19/07/11 16:19:58 INFO rdma.RdmaChannel: Stopping RdmaChannel RdmaChannel(10) 19/07/11 16:19:58 ERROR rdma.RdmaNode: Failed to stop RdmaChannel during 50 ms 19/07/11 16:19:58 INFO rdma.RdmaBufferManager: Rdma buffers allocation statistics: 19/07/11 16:19:58 INFO rdma.RdmaBufferManager: Pre allocated 0, allocated 342 buffers of size 4 KB 19/07/11 16:19:58 INFO rdma.RdmaBufferManager: Pre allocated 0, allocated 535 buffers of size 512 KB SLF4J: Found binding in [jar:file:/share/spark_rdma/hadoop-2.7.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class
So these errors are during the shutdown phase. Does the job itself complete successfully?
Does this mean complete successfully? 19/07/11 16:33:52 INFO DAGScheduler: ResultStage 2 (runJob at SparkHadoopMapReduceWriter.scala:88) finished in 8.207 s 19/07/11 16:33:52 INFO DAGScheduler: Job 1 finished: runJob at SparkHadoopMapReduceWriter.scala:88, took 13.799956 s 19/07/11 16:33:55 INFO SparkHadoopMapReduceWriter: Job job_20190711163338_0006 committed.
Yes. so the job is finished. You can see job history by setting:
spark.eventLog.enabled true
spark.eventLog.dir PATH_FOR_HISTORY_FILES
In spark-defaults.conf, setting the below, there weren't logs in the directory. I run in yarn client mode. So do you have another way to check this issue? spark.eventLog.enabled true spark.eventLog.dir /tmp/spark_logs
https://spark.apache.org/docs/latest/monitoring.html#viewing-after-the-fact
You need to start historyserver + set spark.eventLog.dir hdfs:///tmp/spark_logs
Hi, Petro, I still have the problem. Can you help me analysis it? Thanks. I use 2 arm machine node to run terasort with "huge" profile (about 29G data ).
cpu info: Architecture: aarch64 Byte Order: Little Endian CPU(s): 56 On-line CPU(s) list: 0-55 Thread(s) per core: 1 Core(s) per socket: 28 Socket(s): 2 NUMA node(s): 2 Vendor ID: Cavium Model: 1 Model name: ThunderX2 99xx Stepping: 0x1 BogoMIPS: 400.00 L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 32768K NUMA node0 CPU(s): 0-27 NUMA node1 CPU(s): 28-55 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics cpuid asimdrdm
Net card:
Infiniband device 'mlx5_1' port 1 status: default gid: fe80:0000:0000:0000:9a03:9bff:fe68:4e47 base lid: 0x0 sm lid: 0x0 state: 4: ACTIVE phys state: 5: LinkUp rate: 40 Gb/sec (4X QDR) link_layer: Ethernet
driver: mlx5_core version: 4.6-1.0.1 firmware-version: 16.23.1020 (MT_0000000090) expansion-rom-version: bus-info: 0000:05:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: no supports-register-dump: no supports-priv-flags: yes
OS: ubuntu 18.04 java version "1.8.0_211" hadoop version :2.7.4 spark version :2.2.3 dinsi version:v1.7
top of hibench.conf hibench.scale.profile huge
hibench.default.map.parallelism 200
hibench.default.shuffle.parallelism 120
hadoop.conf
hibench.hadoop.home /home/linux/xjm/spark_rdma/hadoop-2.7.4
hibench.hadoop.executable ${hibench.hadoop.home}/bin/hadoop
hibench.hadoop.configure.dir ${hibench.hadoop.home}/etc/hadoop
hibench.hdfs.master hdfs://wls-arm-cavium05:9000
hibench.hadoop.release apache
The test result (the following is the duration time): no-rdma: 405 rdma: 401
The following are configuration files and executor log. Did my configuration have problems?
stderr-excutor1-no-rdma.txt stderr-excutor2-rdma.txt stderr_excutor1-rdma.txt bench.log spark.conf.txt core-site.xml.txt hdfs-site.xml.txt spark-defauts.conf.txt spark-env.sh.txt
@jiameixie, so 2 is to few to get real advantage of rdma vs tcp. So try to increase number of executors + number of cores + workload size (at least gigantic or bigdata).
Hi all,
I am running Hibench suite terasort against RDMA Spark. My current results show that TCP is performaing better than RDMA. Any thoughts on configs to look into? Happy to post config settings currently in place as well to help diagnose.