Closed ash211 closed 7 years ago
You can set a custom hostname for spark to use, see Mesos here https://github.com/apache/spark/blob/bb7afb4e10bea406a0d7ab03c2ed7aa753f081b7/core/src/main/scala/org/apache/spark/executor/Executor.scala#L79
After the above change, executors are launched with this difference in the podspec (redacted a bit):
Before:
[user@ip-10-0-11-20 ~]$ kubectl get pod -n my-ns-69ede3b8943b4b108834fbaa0ba24e16 my-pod-69ede3b8-943b-4b10-8834-fbaa0ba24e16-global-sql-0-1504250692857-exec-1 -o yaml | grep -A1 SPARK_DRIVER_URL
- name: SPARK_DRIVER_URL
value: spark://CoarseGrainedScheduler@10.255.184.2:45978
[user@ip-10-0-11-20 ~]$
After:
[user@ip-10-0-11-20 ~]$ kubectl get pod -n my-ns-98390f2ec589476f9b5ec0b62d2c5c5e my-pod-98390f2e-c589-476f-9b5e-c0b62d2c5c5e-global-sql-0-1504252263824-exec-1 -o yaml | grep -A1 SPARK_DRIVER_URL
- name: SPARK_DRIVER_URL
value: spark://CoarseGrainedScheduler@my-pod-98390f2e-c589-476f-9b5e-c0b62d2c5c5e-global-sql-0-150425226:36142
[user@ip-10-0-11-20 ~]$
This is the hostname that the Spark executor uses to connect to the driver on (it's the driver's hostname), but there are problems:
-driver
suffix) because the hostname exceeds the 63char hostname component limitPods by default don't get DNS names in k8s. Headless Services would allow us to create one without incurring much of an overhead.
+1 to @foxish's suggestion. In general, there is a concern about putting too many entries in kube-dns, which is why k8s doesn't support DNS names for pods by default. But I think having one DNS name per job isn't too bad.
Excellent - let's go with the headless service approach. I can propose a change.
Although using a service in general implies that DNS has to be running in the cluster for Spark to work. I think this is fine, as to my understanding most real clusters will have DNS. It's worth calling out in the documentation, however.
@foxish to clarify, does the hostname we should set for the driver's URL map to exactly the name of the service? What's the mapping from namespace + service name to appropriate hostname?
Edit: Found the answer in https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/
This PR breaks k8s work
https://issues.apache.org/jira/browse/SPARK-21642