Closed hardymansen closed 7 years ago
I'm using kubectl proxy and that step is working (other issues further on).
Are you sure MYMASTER includes your port number (Sorry, it's the first thing that comes to mind).
IE:
--master k8s://http://127.0.0.1:8001 \
Thanks, The url is https://api.k8s-dev.k8s.aws.mycompany.com, the api is listening on port 443. I tried adding the :443 but nope.
I would suspect it would not work since my driver image don't have the correct permission to spawn a container? but you would think i would get some sane error message if something was wrong with time out against url or permissions, tls or something in that fashion. Now it says explicit "a master url must be set"
For me that points to something in spark? or just a bad error message
It's possible that JVM options aren't properly being propagated in Pyspark's case - @ifilonenko would you have any ideas here?
The JVM options are being defined when launching the PythonRunner
class in the driver-py
CMD so it doesn't seem likely, but I will investigate.
I think the configurations from spark-submit are not making it into the Python application for some reason. Are you using the same Spark distribution for spark-submit as what was used to build the Docker images?
I git cloned this https://github.com/apache-spark-on-k8s/spark and build mvn -DskipTests clean package i believe. I can try with a clean repo again in case i did something weird.
I think we need -P kubernetes
somewhere in that command. But then again, the fact that a pod is even being created at all by spark-submit is surprising, since spark-submit shouldn't even know about Kubernetes mode and fail in that sense.
Can you try running spark-submit from a downloaded distribution of Spark - one of the release tarballs that we host? https://github.com/apache-spark-on-k8s/spark/releases
Sorry, maybe i did have -p kubernetes, sure i will try that. Thank you.
mvn -Pkubernetes -Phadoop-2.7 -Dhadoop.version=2.7.2 -DskipTests clean install
Tada! Awesome, the binaries worked. Thank you! Pi is roughly 3.140224
Nice! Can this issue be closed?
From my end, yes definitely. Thank you.
i start the driver in kubernetes cluster with
bin/spark-submit \ --deploy-mode cluster \ --master k8s://https://MYMASTER \ --kubernetes-namespace default \ --conf spark.executor.instances=5 \ --conf spark.app.name=PythonPi \ --conf spark.kubernetes.driver.docker.image=kubespark/driver-py:v2.1.0-kubernetes-0.3.1 \ --conf spark.kubernetes.executor.docker.image=kubespark/executor-py:v2.1.0-kubernetes-0.3.1 \ --conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.1.0-kubernetes-0.3.1 \ --jars local:///opt/spark/examples/jars/spark-examples_2.11-2.1.0-k8s-0.3.1-SNAPSHOT.jar \ --py-files local:///examples/src/main/python/sort.py \ local:///opt/spark/examples/src/main/python/pi.py 10
the driver pod crashes with
2017-09-25T12:47:46.606681025Z Traceback (most recent call last): 2017-09-25T12:47:46.606721291Z File "/opt/spark/examples/src/main/python/pi.py", line 32, in <module> 2017-09-25T12:47:46.606727837Z .appName("PythonPi")\ 2017-09-25T12:47:46.606731142Z File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 169, in getOrCreate 2017-09-25T12:47:46.606734166Z File "/opt/spark/python/lib/pyspark.zip/pyspark/context.py", line 307, in getOrCreate 2017-09-25T12:47:46.606737106Z File "/opt/spark/python/lib/pyspark.zip/pyspark/context.py", line 118, in __init__ 2017-09-25T12:47:46.606740009Z File "/opt/spark/python/lib/pyspark.zip/pyspark/context.py", line 159, in _do_init 2017-09-25T12:47:46.606744734Z Exception: A master URL must be set in your configuration 2017-09-25T12:47:46.750364405Z Exception in thread "main" org.apache.spark.SparkUserAppException: User application exited with 1 2017-09-25T12:47:46.750388374Z at org.apache.spark.deploy.PythonRunner$.main(PythonRunner.scala:96) 2017-09-25T12:47:46.750392513Z at org.apache.spark.deploy.PythonRunner.main(PythonRunner.scala)
The ENV var SPARK_JAVA_OPT_2: -Dspark.master=k8s://https://MYMASTER