Mellanox / SparkRDMA

This is archive of SparkRDMA project. The new repository with RDMA shuffle acceleration for Apache Spark is here: https://github.com/Nvidia/sparkucx
Apache License 2.0
240 stars 70 forks source link

[error] terasort on spark rdma #10

Closed li7hui closed 6 years ago

li7hui commented 6 years ago

Hi, I was trying to run the terasort app on spark rdma. The terasort code is forked from https://github.com/ehiggs/spark-terasort

I can run the terasort datagen code successfully. However, the program fails when it uses the RDMA utilities. rdmaerror

I checked the spark rdma install guide, but find no clue.

petro-rudenko commented 6 years ago

Hi, seems that you are running SparkRdma in a local mode (that's basically running executor in a separate thread and no network shuffle involved). Can you please provide command that you use to submit spark job. Basically you need to have:

  1. RDMA capable network adapter.
  2. Run in spark standalone or in spark yarn modes with several executors on different physical machines.

Let me know if you'll have questions.

li7hui commented 6 years ago

Hi, Thanks for quick response. I used following submit scripts, and the problem has been fixed.

./bin/spark-submit --class com.github.ehiggs.spark.terasort.TeraSort --master yarn --deploy-mode cluster /root/spark-terasort-1.0-SNAPSHOT-jar-with-dependencies.jar hdfs:///input hdfs:///output

tobegit3hub commented 5 years ago

Hi @li7hui ,

Have you run SparkRDMA in yarn-cluster mode? We have some problems when running the jobs in SparkRDMA because of it always use the wrong IP in another NIC. Do you have multiple network interface cards in the server? Or have you set the LOCAL_SPARK_IP in spark-env.sh?