apache-spark-on-k8s / spark

Apache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apache/spark/
https://spark.apache.org/
Apache License 2.0
612 stars 118 forks source link

How to specify specify limits.cpu,limits.memory,requests.cpu,requests.memory of driver & executor in spark-submit #611

Open Bilwang129 opened 6 years ago

Bilwang129 commented 6 years ago

@liyinan926 @foxish When running the following command:(run the local jar by Dependency Management)

export SPARK_HOME=/home/hadoop/nan.wang/spark-2.2.0-k8s-0.5.0-bin-2.7.3 ${SPARK_HOME}/bin/spark-submit \ --deploy-mode cluster \ --master k8s://https://172.20.0.113:6443 \ --class org.apache.spark.examples.SparkPi \ --kubernetes-namespace automodel \ --conf spark.executor.instances=5 \ --conf spark.app.name=spark-pi \ --conf spark.driver.memory=500M \ --conf spark.executor.memory=500M \ --conf spark.kubernetes.driver.limit.cores=1 \ --conf spark.kubernetes.executor.limit.cores=1 \ --conf spark.executor.cores=0.1 \ --conf spark.driver.cores=0.1 \ --conf spark.kubernetes.driver.docker.image=sz-pg-oam-docker-hub-001.tendcloud.com/library/spark-driver:v2.2.0-kubernetes-0.5.0 --conf spark.kubernetes.executor.docker.image=sz-pg-oam-docker-hub-001.tendcloud.com/library/spark-executor:v2.2.0-kubernetes-0.5.0 --conf spark.kubernetes.initcontainer.docker.image=sz-pg-oam-docker-hub-001.tendcloud.com/library/spark-init:v2.2.0-kubernetes-0.5.0 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \ --conf spark.kubernetes.resourceStagingServer.uri=http://172.20.0.114:30001 \ ${SPARK_HOME}/examples/jars/spark-examples_2.11-2.2.0-k8s-0.5.0.jar

I get the following errors:

2018-02-08 10:21:42 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2018-02-08 10:21:42 INFO SecurityManager:54 - Changing view acls to: hadoop 2018-02-08 10:21:42 INFO SecurityManager:54 - Changing modify acls to: hadoop 2018-02-08 10:21:42 INFO SecurityManager:54 - Changing view acls groups to: 2018-02-08 10:21:42 INFO SecurityManager:54 - Changing modify acls groups to: 2018-02-08 10:21:42 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); groups with view permissions: Set(); users with modify permissions: Set(hadoop); groups with modify permissions: Set() Exception in thread "main" io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://172.20.0.113:6443/api/v1/namespaces/automodel/pods. Message: Forbidden! User automodeluser doesn't have permission. pods "spark-pi-1518056501973-driver" is forbidden: failed quota: compute-resources: must specify limits.cpu,limits.memory,requests.cpu,requests.memory. at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:470) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:407) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:379) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:343) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:226) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:769) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:356) at org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$3.apply(Client.scala:124) at org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$3.apply(Client.scala:123) at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2551) at org.apache.spark.deploy.k8s.submit.Client.run(Client.scala:123) at org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$5.apply(Client.scala:191) at org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$5.apply(Client.scala:184) at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2551) at org.apache.spark.deploy.k8s.submit.Client$.run(Client.scala:184) at org.apache.spark.deploy.k8s.submit.Client$.main(Client.scala:204) at org.apache.spark.deploy.k8s.submit.Client.main(Client.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:786) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 2018-02-08 10:21:44 INFO ShutdownHookManager:54 - Shutdown hook called 2018-02-08 10:21:44 INFO ShutdownHookManager:54 - Deleting directory /tmp/uploaded-jars-e65a4f76-5001-44a2-9c1b-16941a4362c0 2018-02-08 10:21:44 INFO ShutdownHookManager:54 - Deleting directory /tmp/uploaded-files-6f4709f0-e0c9-40ba-8143-cc0ad4a02780

But when running the following command(with the same parameters,run the jar in the Docker image):

export SPARK_HOME=/home/hadoop/nan.wang/spark-2.2.0-k8s-0.5.0-bin-2.7.3 ${SPARK_HOME}/bin/spark-submit \ --deploy-mode cluster \ --master k8s://https://172.20.0.113:6443 \ --class org.apache.spark.examples.SparkPi \ --kubernetes-namespace automodel \ --conf spark.executor.instances=5 \ --conf spark.app.name=spark-pi \ --conf spark.driver.memory=500M \ --conf spark.executor.memory=500M \ --conf spark.kubernetes.driver.limit.cores=1 \ --conf spark.kubernetes.executor.limit.cores=1 \ --conf spark.executor.cores=0.1 \ --conf spark.driver.cores=0.1 \ --conf spark.kubernetes.driver.docker.image=sz-pg-oam-docker-hub-001.tendcloud.com/library/spark-driver:v2.2.0-kubernetes-0.5.0 --conf spark.kubernetes.executor.docker.image=sz-pg-oam-docker-hub-001.tendcloud.com/library/spark-executor:v2.2.0-kubernetes-0.5.0 --conf spark.kubernetes.initcontainer.docker.image=sz-pg-oam-docker-hub-001.tendcloud.com/library/spark-init:v2.2.0-kubernetes-0.5.0 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \ local:///opt/spark/examples/jars/spark-examples_2.11-2.2.0-k8s-0.5.0.jar

It run successfully.And resources quota like the following:

driver & executor : resources: limits: cpu: "1" memory: 884Mi requests: cpu: 100m memory: 500Mi

I want to know which conf can I specify specify limits.cpu,limits.memory,requests.cpu,requests.memory of driver & executor in spark-submit

liyinan926 commented 6 years ago

Which version of Kubernetes are you using? Can you run the following command?

kubectl describe limits --namespace=automodel
Bilwang129 commented 6 years ago

version of Kubernetes : v1.8.5 I can run kubectl describe limits --namespace=automodel but got nothing @liyinan926

liyinan926 commented 6 years ago

OK, then it makes sense why it said must specify limits.cpu,limits.memory,requests.cpu,requests.memory since the namespace you used doe not have a default value for any of them. It's weird that the one using the container-local example jar worked fine. I suspect there's a bug in the code such that when local dependencies need to be uploaded and an additional step is needed to setup the init-container, the resource requests set in the BaseDriverConfigurationStep got lost.

Bilwang129 commented 6 years ago

I also run the following command to specify limits.cpu,limits.memory,requests.cpu,requests.memory ( requests.memory : spark.driver.memory=500M requests.cpu : spark.driver.cores=0.1 limits.memory : spark.driver.memory + spark.kubernetes.driver.memoryOverhead=900M limits.cpu : spark.kubernetes.driver.limit.cores=1 )

export SPARK_HOME=/home/hadoop/nan.wang/spark-2.2.0-k8s-0.5.0-bin-2.7.3 ${SPARK_HOME}/bin/spark-submit \ --deploy-mode cluster \ --class org.apache.spark.examples.SparkPi \ --kubernetes-namespace automodel \ --conf spark.executor.instances=5 \ --conf spark.app.name=spark-pi \ --conf spark.driver.memory=500M \ --conf spark.executor.memory=500M \ --conf spark.kubernetes.driver.memoryOverhead=400M \ --conf spark.kubernetes.executor.memoryOverhead=400M \ --conf spark.driver.cores=0.1 \ --conf spark.executor.cores=0.1 \ --conf spark.kubernetes.driver.limit.cores=1 \ --conf spark.kubernetes.executor.limit.cores=1 \ --conf spark.kubernetes.driver.docker.image=sz-pg-oam-docker-hub-001.tendcloud.com/library/spark-driver:v2.2.0-kubernetes-0.5.0 --conf spark.kubernetes.executor.docker.image=sz-pg-oam-docker-hub-001.tendcloud.com/library/spark-executor:v2.2.0-kubernetes-0.5.0 --conf spark.kubernetes.initcontainer.docker.image=sz-pg-oam-docker-hub-001.tendcloud.com/library/spark-init:v2.2.0-kubernetes-0.5.0 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \ --conf spark.kubernetes.resourceStagingServer.uri=http://172.20.0.115:30001 \ ${SPARK_HOME}/examples/jars/spark-examples_2.11-2.2.0-k8s-0.5.0.jar

But also does not work .

liyinan926 commented 6 years ago

But also does not work .

What doesn't work here? You were not able to run the above example (even if it used the container-local example jar)? Or something else?

Bilwang129 commented 6 years ago

I have modified the above command. It does not work when using Dependency Management

liyinan926 commented 6 years ago

OK. It looks like a bug.

Bilwang129 commented 6 years ago

Is there other way to run the local application jar in the submitting machine

Bilwang129 commented 6 years ago

@liyinan926

liyinan926 commented 6 years ago

@Bilwang129 if you have access to an HDFS cluster, or cloud storage options such as S3, you can upload the jars to those places, and use the remote URLs of those jars. Spark can automatically download them.