apache-spark-on-k8s / spark

Apache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apache/spark/
https://spark.apache.org/
Apache License 2.0
612 stars 118 forks source link

[Question] How to add customized pod to the spark workers? #579

Closed leletan closed 6 years ago

leletan commented 6 years ago

In our stand alone clusters, we installed datadog agent onto every each worker node for reliable stats collection. Wondering if we can do something similar to spark on kube.

liyinan926 commented 6 years ago

I think you would need to build and use custom driver and executor images that have datadog installed. See https://github.com/apache-spark-on-k8s/userdocs/blob/master/src/jekyll/running-on-kubernetes.md#docker-images.

leletan commented 6 years ago

Yah, thought about that. But then we will have multiple processes running in one docker, which is not idiomatic. Not sure if there is any other workaround.

liyinan926 commented 6 years ago

I'm not familiar with datadog, can it run as a sidecar container in the same pod?

leletan commented 6 years ago

Yah, running as a sidecar container in the same pod would be ideal. We are using a slightly customized version to https://github.com/DataDog/docker-dd-agent

liyinan926 commented 6 years ago

Spark on Kubernetes currently does not support sidecar containers yet. But I think this is a use case that https://github.com/liyinan926/spark-operator can support by injecting this sidecar container into the driver and executor pods through the initializer. Is there any configuration (e.g., environment variables) need to be done to the container?

leletan commented 6 years ago

Yah, we will need to set a couple of them: API_KEY, HOSTNAME, TAG, etc

foxish commented 6 years ago

Sidecar containers should be possible to inject through webhook initializers in K8s 1.9. If you're on an older version and don't have access to k8s alpha features (pod presets or initializers), there's no easy way to accomplish this yet. Agreed with @liyinan926 that this is a good fit for the spark-operator use-case.

leletan commented 6 years ago

Cool. Thanks for the answers, guys.