Mellanox / SparkRDMA

This is archive of SparkRDMA project. The new repository with RDMA shuffle acceleration for Apache Spark is here: https://github.com/Nvidia/sparkucx
Apache License 2.0
240 stars 70 forks source link

spark on yarn Compatibility between different versions? #24

Open chenzhaohangbj opened 5 years ago

chenzhaohangbj commented 5 years ago

When I run Spark on yarn Spark2.1.0 Hadoop2.7.3

nodemanger pull-in spark-2.1.0-yarn-shuffle.jar,but when spark version is not spark-2.1.0,container can not launch .

image

petro-rudenko commented 5 years ago

Hi, do you use external shuffle service?

chenzhaohangbj commented 5 years ago

my spark conf: spark.driver.extraClassPath /home/bigdata/local/spark-rdma-3.1-for-spark-2.1.0-jar-with-dependencies.jar spark.executor.extraClassPath /home/bigdata/local/spark-rdma-3.1-for-spark-2.1.0-jar-with-dependencies.jar spark.shuffle.manager org.apache.spark.shuffle.rdma.RdmaShuffleManager spark.shuffle.compress false spark.shuffle.spill.compress false spark.broadcast.compress false spark.broadcast.checksum false spark.locality.wait 0

petro-rudenko commented 5 years ago

So in release tar - there's prebuilded jars for spark versions started from 2.0 to 2.4.

petro-rudenko commented 5 years ago

Do you try to use with different spark version?

chenzhaohangbj commented 5 years ago

yes,is not ok.

petro-rudenko commented 5 years ago

@chenzhaohangbj which spark version and which SparkRDMA jar do you use?

chenzhaohangbj commented 5 years ago

spark 2.1.0 spark 2.1.1 spark 2.3.0

petro-rudenko commented 5 years ago

So you need to use:

spark 2.1.0 - spark-rdma-3.1-for-spark-2.1.0-jar-with-dependencies.jar
spark 2.1.1 - spark-rdma-3.1-for-spark-2.1.0-jar-with-dependencies.jar
spark 2.3.0 - spark-rdma-3.1-for-spark-2.3.0-jar-with-dependencies.jar
chenzhaohangbj commented 5 years ago

nodemanager need which jar?

petro-rudenko commented 5 years ago

Nodemanager doesn't need any jar. We don't support external shuffle service yet.

chenzhaohangbj commented 5 years ago

spark rdma shuffle and spark shuffle can compatible on nodemanager?

petro-rudenko commented 5 years ago

If you don't use External Yarn Shuffle Service - then nodemanager is used only to launch Spark application. Spark itself will instantiate configured shuffle service. SparkRDMA is fully compatible with default Spark Shuffle.

ilovesxl commented 4 years ago

I notice that it is hardcoded in https://github.com/apache/spark/blob/master/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolver.java:150 that spark external shuffle only support two shufflemanager types.So if I edit the code and compile spark,can the SparkRDMA works with external shuffle enabled?