Apache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apache/spark/
Currently the integration-tests check inputs into Client#run() but do not test completely end-to-end starting from a spark-submit command. As such it was overlooked that PySpark submission fails because of argument processing in fromClientArguments(). The problem is that when --py-files is null fromClientArguments still tries to run .mkString() on --other-py-files causing the job to fail. Example:
bin/spark-submit \
--deploy-mode cluster \
--master k8s://https://192.168.99.100:8443 \
--kubernetes-namespace default \
--conf spark.executor.instances=5 \
--conf spark.app.name=spark-pi \
--conf spark.kubernetes.driver.docker.image=kubespark/driver-py:v2.1.0-kubernetes-0.3.0 \
--conf spark.kubernetes.executor.docker.image=kubespark/executor-py:v2.1.0-kubernetes-0.3.0 \
--conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.1.0-kubernetes-0.3.0 \
local:///opt/spark/examples/src/main/python/pi.py 10
Exception in thread "main" java.lang.RuntimeException: Unknown arguments: --other-py-files null
at org.apache.spark.deploy.kubernetes.submit.ClientArguments$$anonfun$fromCommandLineArgs$1.applyOrElse(Client.scala:58)
at org.apache.spark.deploy.kubernetes.submit.ClientArguments$$anonfun$fromCommandLineArgs$1.applyOrElse(Client.scala:45)
at scala.collection.immutable.List.collect(List.scala:303)
at org.apache.spark.deploy.kubernetes.submit.ClientArguments$.fromCommandLineArgs(Client.scala:45)
at org.apache.spark.deploy.kubernetes.submit.Client$.main(Client.scala:199)
at org.apache.spark.deploy.kubernetes.submit.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:783)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Currently the integration-tests check inputs into Client#run() but do not test completely end-to-end starting from a spark-submit command. As such it was overlooked that PySpark submission fails because of argument processing in fromClientArguments(). The problem is that when
--py-files
isnull
fromClientArguments still tries to run.mkString()
on--other-py-files
causing the job to fail. Example: