apache-spark-on-k8s / spark

Apache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apache/spark/
https://spark.apache.org/
Apache License 2.0
612 stars 118 forks source link

Duplicated secret volume in pod spec #594

Closed hex108 closed 6 years ago

hex108 commented 6 years ago

When specifying secret(e.g. --conf spark.kubernetes.driver.secrets.test=pass) in spark-submit command line, there will be an error "Duplicate value "XXX-volume". Because the volume is added twice to pod spec: mountSecret for main container, mountSecret for init container.

Command line:

$ bin/spark-submit --deploy-mode cluster --class org.apache.spark.examples.SparkPi
--master k8s://http://localhost:8080 --kubernetes-namespace default
--conf spark.executor.instances=5
--conf spark.app.name=spark-pi
--conf spark.kubernetes.driver.docker.image=jungong/spark-driver:hdfs
--conf spark.kubernetes.executor.docker.image=jungong/spark-executor:hdfs --conf spark.kubernetes.initcontainer.docker.image=jungong/spark-init:hdfs --conf spark.kubernetes.resourceStagingServer.uri=http://10.178.106.222:31000 --conf spark.kubernetes.initcontainer.inannotation=true --conf spark.kubernetes.driver.secrets.test=pass examples/jars/spark-examples_2.11-2.2.0-k8s-0.5.0.jar

Error message:

Exception in thread "main" io.fabric8.kubernetes.client.KubernetesClientException: 
Failure executing: POST at: http://localhost:8080/api/v1/namespaces/default/pods.
 Message: Pod "spark-pi-1514974049945-driver" is invalid: spec.volumes[7].name:
 Duplicate value: "test-volume". Received status: Status(apiVersion=v1, code=422, 
details=StatusDetails(causes=[StatusCause(field=spec.volumes[7].name, message=Duplicate 
value: "test-volume", reason=FieldValueDuplicate, additionalProperties={})], group=null, kind=Pod, 
name=spark-pi-1514974049945-driver, retryAfterSeconds=null, uid=null, additionalProperties={}), 
kind=Status, message=Pod "spark-pi-1514974049945-driver" is invalid: spec.volumes[7].name: 
Duplicate value: "test-volume", metadata=ListMeta(resourceVersion=null, selfLink=null, 
additionalProperties={}), reason=Invalid, status=Failure, additionalProperties={}).
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:470)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:409)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:379)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:343)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:226)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:769)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:356)
    at org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$3.apply(Client.scala:132)
    at org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$3.apply(Client.scala:131)
    at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2551)
    at org.apache.spark.deploy.k8s.submit.Client.run(Client.scala:131)
    at org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$5.apply(Client.scala:200)
    at org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$5.apply(Client.scala:193)
    at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2551)
    at org.apache.spark.deploy.k8s.submit.Client$.run(Client.scala:193)
    at org.apache.spark.deploy.k8s.submit.Client$.main(Client.scala:213)
    at org.apache.spark.deploy.k8s.submit.Client.main(Client.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:786)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
liyinan926 commented 6 years ago

Ah, yes, this is a bug. I will fix this both upstream into branch-2.3 and here.

hex108 commented 6 years ago

@liyinan926 Thanks. I have a patch available. If you have not started, I could file a PR for it.

liyinan926 commented 6 years ago

@hex108 Thanks for creating a patch! Can you create a PR? Thanks!

liyinan926 commented 6 years ago

@hex108 Since the same bug has also been upstreamed into apache/branch-2.3 and it's critical and urgent to get the fix into 2.3, I went ahead and created a PR https://github.com/apache/spark/pull/20148 against apache/branch-2.3. Please take a look at that PR and let us know if the fix is reasonable to you. Thanks for reporting the bug and feel free to suggest a different fix if necessary!

hex108 commented 6 years ago

@liyinan926 A little later... I'll review it. Will close #595 soon.

liyinan926 commented 6 years ago

Thanks! @hex108. I saw your PR, and it looks like your PR is doing semantically the same fix. Please don't close the PR for now as the fix is needed for this fork anyway.