Bypass init-containers if `spark.jars` and `spark.files` is empty or only has `local://` URIs

mccheah commented 7 years ago

If all dependencies are installed in the Docker images, the init-container will do no work and so we shouldn't run it.

chenchun commented 7 years ago

+1 for this. Adding an init container may slow down pod launching. What about making driver container download all dependencies itself ?

mccheah commented 7 years ago

@chenchun the preferable design is to use the init-contianer because it allows the driver container to be completely generic. Or to put it another way:

1) Users who write custom Docker images don't need to call a specific class, but can just run the application main class directly, 2) When we write multiple driver runtime implementations (e.g. Python) they can all share the same init-container without modifying their commands.

We can look into improving Kubernetes itself in terms of speeding up its init-container execution time. When the nodes cache the init-container docker image that will improve performance on subsequent runs, but there is still the overhead of starting the container itself.

apache-spark-on-k8s / spark

Bypass init-containers if `spark.jars` and `spark.files` is empty or only has `local://` URIs #338