apache-spark-on-k8s / spark

Apache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apache/spark/
https://spark.apache.org/
Apache License 2.0
612 stars 118 forks source link

Create RBAC role YAMLs and documentation #500

Closed foxish closed 7 years ago

foxish commented 7 years ago

We need RBAC roles associated with each component - shuffle service, RSS. Also, need instructions to setup service accounts for driver and executor pods.

liyinan926 commented 7 years ago

@foxish @kimoonkim Regarding RBAC roles for the RSS and shuffle service, is there any customization needed or they are well taken care of by the default role/service account used by the pods of them? AFAIK, none of them need write access the API server. Correct me if I'm wrong.

kimoonkim commented 7 years ago

One thing that jumps out to me. The shuffle service relies on HostPath volumes, which is not necessarily available to all pods. There is PodSecurityPolicy that can be used together with RBAC to allow the access. For details, see this doc. So I think we should address PSP RBAC rules. I'll be happy to dig more in this, as it also applies to kubernetes-HDFS.

liyinan926 commented 7 years ago

@kimoonkim I also found this doc, which also seems related.

kimoonkim commented 7 years ago

Ah. That doc seems very relevant. Thanks for sharing it!

kimoonkim commented 7 years ago

Probably not the scope of this issue. But I was wondering if we should also think about human accounts and the role bindings they need to run Spark jobs and these other services.

I am personally using the cluster admin account for myself, but not every user will have access to that in a large org.

liyinan926 commented 7 years ago

I agree that we should think about non-admin user account, which is likely much more common in production environment in large clusters.