Open mccheah opened 7 years ago
@foxish @erikerlandson @ash211 as discussed yesterday.
One nuance is that one could want to include jars only in the driver docker image and have them be shipped over to the executors - but we can assume the simplest case for now. This is effectively allowing spark.driver.extraClassPath
and spark.executor.extraClassPath
to be provided in the Docker image.
Currently when a custom docker image is provided with an application's jars mounted on it, the user has to explicitly specify the location of these jars in the docker image. They provide paths with
local://
URI schemes to do this. In practice this would seem to be redundant. Multiple users that all submit the same application would have to specify the exact same set of URIs, and, the user would have to know where the jars live in the Docker image.One idea is for the docker images to support the presence of an environment variable, say
SPARK_EXTRA_CLASSPATH
. The environment variable could be set on both the driver and the executor images. When the variable is set, theCMD
of the base Spark image would add entries fromSPARK_EXTRA_CLASSPATH
to the driver and executor classpath.For example, the base driver docker image that we can provide could have this:
(
spark-submit
provides everything exceptSPARK_EXTRA_CLASSPATH
) - and a custom Docker image implementation could have this:We will be changing the structure of the Docker images when submission is redone with the submission staging server, so we ought to include this change in that iteration as well.