apache-spark-on-k8s / spark

Apache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apache/spark/
https://spark.apache.org/
Apache License 2.0
612 stars 118 forks source link

[ShuffleService] Need for spark.local.dir ? #549

Open echarles opened 6 years ago

echarles commented 6 years ago

When I run a spark job with spark.shuffle.service.enabled=true (without spark.local.dir property), I receive an exception

Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: spark.local.dir must be provided explicitly when using the external shuffle service in Kubernetes. These directories should map to the paths that are mounted into the external shuffle service pods.
    at scala.Predef$.require(Predef.scala:224)
    at org.apache.spark.deploy.k8s.submit.submitsteps.LocalDirectoryMountConfigurationStep.configureDriver(LocalDirectoryMountConfigurationStep.scala:56)

I then simpley add spark.local.dir=/tmp/spark-local (not documented on https://apache-spark-on-k8s.github.io/userdocs/running-on-kubernetes.html#dynamic-executor-scaling) and it works fine.

Is this the expected behavior?

liyinan926 commented 6 years ago

Yes, this is expected. Please see https://github.com/apache-spark-on-k8s/spark/blob/branch-2.2-kubernetes/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/submitsteps/LocalDirectoryMountConfigurationStep.scala#L55. We need to update the docs.

echarles commented 6 years ago

I can open a PR on the userdocs repo for this.

Just to be sure, in case of external shuffle service, the given spark.local.dir remain local to the executor Pods and are not intended to be shared with the shuffle service. Correct?

Also, the comments in the code say:

When using the external shuffle service, it is risky to assume that the user intends to mount the JVM temporary directory into the pod as a hostPath volume

Why is it more risky when using the external shuffle service? Do those path need to be all the same for all the executors?

liyinan926 commented 6 years ago

@mccheah @ash211 @foxish on the semantics around spark.local.dir and shuffle service.