apache-spark-on-k8s / spark

Apache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apache/spark/
https://spark.apache.org/
Apache License 2.0
612 stars 118 forks source link

Unable to submit files from local systems to pyspark #603

Open ravi-ramadoss opened 6 years ago

ravi-ramadoss commented 6 years ago

I am trying to test a local spark script. Whenever I try to upload a file from local Mac system to qinikube cluster, I get the below error.

$SPARK_HOME/bin/spark-submit \
  --deploy-mode cluster \
  --master k8s://https://192.168.99.100:8443 \
  --kubernetes-namespace spark \
  --conf spark.executor.instances=1 \
  --conf spark.executor.memory=512m \
  --conf spark.driver.memory=512m \
  --conf spark.app.name=spark-pi \
  --conf spark.executor.cores=0.2 \
  --conf spark.driver.cores=0.2 \
  --conf spark.kubernetes.driver.docker.image=kubespark/spark-driver-py:v2.2.0-kubernetes-0.5.0 \
  --conf spark.kubernetes.executor.docker.image=kubespark/spark-executor-py:v2.2.0-kubernetes-0.5.0 \
  --jars local:///opt/spark/examples/jars/spark-examples_2.11-2.2.0-k8s-0.5.0.jar \
  --py-files schools.py \
  schools.py

I see the below Error in dashboard for the driver pod

MountVolume.SetUp failed for volume "spark-init-properties" : configmaps "spark-pi-1515860438062-init-config" not found

Image: kubespark/spark-driver-py:v2.2.0-kubernetes-0.5.0 Environment variables

SPARK_DRIVER_MEMORY: 896m
SPARK_DRIVER_CLASS: org.apache.spark.deploy.PythonRunner
SPARK_DRIVER_ARGS: 
SPARK_MOUNTED_FILES_DIR: /var/spark-data/spark-files
PYSPARK_PRIMARY: /var/spark-data/spark-files/schools.py
PYSPARK_FILES: /var/spark-data/spark-files/schools.py
SPARK_DRIVER_JAVA_OPTS: -Dspark.kubernetes.driver.docker.image=kubespark/spark-driver-py:v2.2.0-kubernetes-0.5.0 -Dspark.executor.memory=512m -Dspark.kubernetes.initcontainer.executor.configmapkey=download-submitted-files -Dspark.kubernetes.executor.docker.image=kubespark/spark-executor-py:v2.2.0-kubernetes-0.5.0 -Dspark.app.name=spark-pi -Dspark.submit.deployMode=cluster -Dspark.executor.cores=0.2 -Dspark.kubernetes.driver.pod.name=spark-pi-1515860438062-driver -Dspark.master=k8s://https://192.168.99.100:8443 -Dspark.driver.memory=512m -Dspark.kubernetes.namespace=spark -Dspark.kubernetes.executor.podNamePrefix=spark-pi-1515860438062 -Dspark.files=/var/spark-data/spark-files/schools.py,/var/spark-data/spark-files/schools.py -Dspark.kubernetes.initcontainer.executor.configmapname=spark-pi-1515860438062-init-config -Dspark.app.id=spark-79017a9be721488e8be810480838412a -Dspark.executor.instances=1 -Dspark.driver.cores=0.2

Commands: - Args: -

ifilonenko commented 6 years ago

You need init container and rss defined as part of conf. Look at usage docs to see an example

ravi-ramadoss commented 6 years ago

I followed the steps from the page https://apache-spark-on-k8s.github.io/userdocs/running-on-kubernetes.html

I am sure I am missing something. Is there any walkthrough or example to do this?

kubectl create -f conf/kubernetes-resource-staging-server.yaml
$SPARK_HOME/bin/spark-submit \
  --deploy-mode cluster \
  --class org.apache.spark.examples.SparkPi \
  --master k8s://https://192.168.99.100:8443 \
  --kubernetes-namespace default \
  --conf spark.executor.instances=5 \
  --conf spark.app.name=spark-pi \
  --conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:v2.2.0-kubernetes-0.5.0 \
  --conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:v2.2.0-kubernetes-0.5.0 \
  --conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.2.0-kubernetes-0.5.0 \
  --conf spark.kubernetes.resourceStagingServer.uri=http://192.168.99.100:31000 \
  --py-files pi.py \
  pi.py

Still I get the same error

MountVolume.SetUp failed for volume "spark-init-properties" : configmaps "spark-pi-1516935374044-init-config" not found