Hydrospheredata / mist

Serverless proxy for Spark cluster
http://hydrosphere.io/mist/
Apache License 2.0
326 stars 68 forks source link

Support for spark-on-k8s kubernetes #430

Closed austinnichols101 closed 6 years ago

austinnichols101 commented 6 years ago

Ref: https://apache-spark-on-k8s.github.io/userdocs/running-on-kubernetes.html

As of Spark 2.2, kubernetes support is experimental. However, it appears that Spark 2.3 will take work from this project and integrate into the main project.

For testing, we created our own version of the mist Dockerfile, substituting the "Spark with Kubernetes Support" tarball from this page:

https://apache-spark-on-k8s.github.io/userdocs/running-on-kubernetes.html

We then tested a manual spark-submit from inside the new docker container (to prove communications and eliminate any connectivity and spark configuration issues). This is how the spark-submit is formatted:

bin/spark-submit \
  --deploy-mode cluster \
  --class org.apache.spark.examples.SparkPi \
  --master k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> \
  --kubernetes-namespace default \
  --conf spark.executor.instances=5 \
  --conf spark.app.name=spark-pi \
  --conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:v2.2.0-kubernetes-0.5.0 \
  --conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:v2.2.0-kubernetes-0.5.0 \
  local:///opt/spark/examples/jars/spark-examples_2.11-2.2.0-k8s-0.5.0.jar

We then created mist context + function + artifact files, using the hello_mist examples.

The first problem is regarding dependencies. From the documentation:

Application dependencies that are being submitted from your machine need to be sent to a resource staging server that the driver and executor can then communicate with to retrieve those dependencies.

k8s expects the dependency to be either on the local k8s resource staging server or via remote API (URL). The driver/executor are created on-the-fly when the job is submitted.

The second problem we encountered was that mist was able to submit the job, but k8s was unable to spin up the worker, with a repeating "must specify the driver pod name" error. We also noticed that even though the jobs failed (or were manually terminated) that mist kept attempting to spin up working in the background (possible bug?).

blvp commented 6 years ago

Fixed with #460