apache-spark-on-k8s / spark

Apache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apache/spark/
https://spark.apache.org/
Apache License 2.0
612 stars 118 forks source link

allow spark driver find shuffle pods in specified namespace #357

Closed honkiko closed 7 years ago

honkiko commented 7 years ago

The conf property spark.kubernetes.shuffle.namespace is used to specify the namesapce of shuffle pods.

In normal cases, only one "shuffle daemonset" is deployed and shared by all spark pods.

The spark driver should be able to list and watch shuffle pods in the namespace specified by user.

Note: by default, spark driver pod doesn't have authority to list and watch shuffle pods in another namespace. Some action is needed to grant it the authority. For example, below ABAC policy works.

{"apiVersion": "abac.authorization.kubernetes.io/v1beta1", "kind":
"Policy", "spec": {"group": "system:serviceaccounts", "namespace":
"SHUFFLE_NAMESPACE",
"resource": "pods", "readonly": true}}

What changes were proposed in this pull request?

allow spark driver find shuffle pods in specified namespace

How was this patch tested?

tested by example SparkPi.

foxish commented 7 years ago

Thanks for the PR @honkiko. This LGTM.

ash211 commented 7 years ago

Thanks for the contribution @honkiko ! We're always happy to see new names in this project.

It looks like you're fixing a bug here, given that the dsNamespace was never used before. This change allows running shuffle pods in a non-default namespace?