apache-spark-on-k8s / spark

Apache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apache/spark/
https://spark.apache.org/
Apache License 2.0
612 stars 118 forks source link

Dockerfile publishing #566

Open foxish opened 6 years ago

foxish commented 6 years ago

From ongoing thread on docker images in http://apache-spark-developers-list.1001551.n3.nabble.com/Publishing-official-docker-images-for-KubernetesSchedulerBackend-td22928.html

Currently, we have a wide array of dockerfiles that are all based on spark-base, with minor customizations. There is some discussion on publishing those.

Our high level, I think, as articulated on the thread is - We publish canonical images that serve as both - a complete image for most Spark applications, as well as a stable substrate to build customization upon for the subset of applications that need it.

Thoughts? Comments? cc/ @apache-spark-on-k8s/contributors @felixcheung @tnachen

foxish commented 6 years ago

There are some things we can do to simplify/unify some of those images (by moving the CMD into the scheduler backend code for example). I'm unsure what we might gain by doing that - since the images aren't particularly k8s specific at this time anyway and one could in theory set the right env-vars and reuse those images.

erikerlandson commented 6 years ago

As I mentioned on the SIG meeting discussion, I think moving the CMD back into the scheduler code is not a good idea - for one thing that would take it off the table for users to customize it in their own container images.

The strategy of unifying a spark-base image with mesos seems like a good one. I would expect any other deviations (kube, mesos, or anything else) to be relatively thin modifications of spark-base

felixcheung commented 6 years ago

I think it makes sense to me to have one official Spark image.

As of now, I don't see anything in k8s spark-base that is specific to k8s.

mesos does have a Dockerfile (but not image in release) in the Spark codebase, and it has a mesosphere specific base image (FROM) so all in all, we might not be able to say we are replacing/releasing one image or Dockerfile for all cluster type / resource manager, but at least we could say this new image or Dockerfile is not specific to k8s use.

As for the mail thread, I think we could re-articulate the built-in capability for customization with Docker with this spark-base serving as the base image.