Spark comes with a history server, it provides a great UI with many information regarding Spark jobs execution (event timeline, detail of stages, etc.).
Details can be found in the Spark monitoring page.
I've modified the gettyimages/docker-spark to be able to run it with the docker-compose upcommand.
With this implementation, its UI will be running at http://${YOUR_DOCKER_HOST}:18080.
To use the Spark’s history server you have to tell your Spark driver:
to log events: spark.eventLog.enabled true (it's false by default)
the log directory to use: spark.eventLog.dir file:/tmp/spark-events
By default the /tmp/spark-events is mounted on the ./spark-events at the root of the repo (I call it $DOCKER_SPARK).
So you have to tell the driver to log events in this directory (on your local machine).
This example shows this configuration for a spark-submit (the two --conf options):
Spark comes with a history server, it provides a great UI with many information regarding Spark jobs execution (event timeline, detail of stages, etc.). Details can be found in the Spark monitoring page.
I've modified the gettyimages/docker-spark to be able to run it with the
docker-compose up
command.With this implementation, its UI will be running at
http://${YOUR_DOCKER_HOST}:18080
.To use the Spark’s history server you have to tell your Spark driver:
spark.eventLog.enabled true
(it'sfalse
by default)spark.eventLog.dir file:/tmp/spark-events
By default the
/tmp/spark-events
is mounted on the./spark-events
at the root of the repo (I call it$DOCKER_SPARK
). So you have to tell the driver to log events in this directory (on your local machine).This example shows this configuration for a
spark-submit
(the two--conf
options):Note: This settings can be defined in the driver's
$SPARK_HOME/conf/spark-defaults.conf
to avoid using the--conf
option.This comment comes from my blog post.