apache-spark-on-k8s / spark

Apache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apache/spark/
https://spark.apache.org/
Apache License 2.0
612 stars 118 forks source link

PySpark Submission Failing on --py-files #406

Closed ifilonenko closed 7 years ago

ifilonenko commented 7 years ago

Currently the integration-tests check inputs into Client#run() but do not test completely end-to-end starting from a spark-submit command. As such it was overlooked that PySpark submission fails because of argument processing in fromClientArguments(). The problem is that when --py-files is null fromClientArguments still tries to run .mkString() on --other-py-files causing the job to fail. Example:

bin/spark-submit \
  --deploy-mode cluster \
  --master k8s://https://192.168.99.100:8443 \
  --kubernetes-namespace default \
  --conf spark.executor.instances=5 \
  --conf spark.app.name=spark-pi \
  --conf spark.kubernetes.driver.docker.image=kubespark/driver-py:v2.1.0-kubernetes-0.3.0 \
  --conf spark.kubernetes.executor.docker.image=kubespark/executor-py:v2.1.0-kubernetes-0.3.0 \
  --conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.1.0-kubernetes-0.3.0 \
  local:///opt/spark/examples/src/main/python/pi.py 10
Exception in thread "main" java.lang.RuntimeException: Unknown arguments: --other-py-files null
    at org.apache.spark.deploy.kubernetes.submit.ClientArguments$$anonfun$fromCommandLineArgs$1.applyOrElse(Client.scala:58)
    at org.apache.spark.deploy.kubernetes.submit.ClientArguments$$anonfun$fromCommandLineArgs$1.applyOrElse(Client.scala:45)
    at scala.collection.immutable.List.collect(List.scala:303)
    at org.apache.spark.deploy.kubernetes.submit.ClientArguments$.fromCommandLineArgs(Client.scala:45)
    at org.apache.spark.deploy.kubernetes.submit.Client$.main(Client.scala:199)
    at org.apache.spark.deploy.kubernetes.submit.Client.main(Client.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:783)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
ifilonenko commented 7 years ago

@mccheah @foxish Pushing PR with fix. Do let me know if you can get around this somehow without a PR so release doesn't get affected