apache-spark-on-k8s / spark

Apache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apache/spark/
https://spark.apache.org/
Apache License 2.0
612 stars 118 forks source link

Explicitly expose ports on driver container, fixes #617 #618

Closed adelbertc closed 5 years ago

adelbertc commented 6 years ago

What changes were proposed in this pull request?

Expose ports explicitly in the driver container. The driver Service created expects to reach the driver Pod at specific ports which before this change, were not explicitly exposed and would likely cause connection issues (see #617).

How was this patch tested?

Failure in #617 reproduced on Kubernetes 1.6.x and 1.8.x. Built the driver image with this patch and observed fixed #617 on Kubernetes 1.6.x.

liyinan926 commented 6 years ago

We have never run into this issue as reported here. @foxish.

adelbertc commented 6 years ago

I imagine if a K8s cluster is secured with a network policy or something pods are not exposed by default which would make this change necessary.

rvesse commented 6 years ago

@liyinan926 Just to confirm that we've seen a very similar issue with specific network overlays - Romana - but not encountered it with others

liyinan926 commented 6 years ago

@rvesse are you able to test this patch and see if it fixes your issue?

rvesse commented 6 years ago

@liyinan926 We have a different patch internally which fixes our issue, not sure if it is the exact same problem though clearly closely related. The developer who did the debugging and patch is currently out on jury duty until mid next week so I am waiting for her to get back and take a look at this

mccheah commented 6 years ago

Can this be posted against apache/spark instead?

foxish commented 6 years ago

+1, please post the fix against the upstream repo. We'll backport it here at some point.

On Wed, Mar 7, 2018 at 1:13 PM mccheah notifications@github.com wrote:

Can this be posted against apache/spark instead?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/apache-spark-on-k8s/spark/pull/618#issuecomment-371286574, or mute the thread https://github.com/notifications/unsubscribe-auth/AA3U57zyIYJaDzqkHvo9wRh3L5ef1N6kks5tcE1vgaJpZM4SV42C .

-- Anirudh Ramanathan

jhoole commented 6 years ago

I'm the developer that works with rvesse. We have the exact same stacktrace, but I think our underlying problem is different. As far as we could figure out, the problem was coming from that Romana relies on entries in the routing table to route traffic from 10. to 192. and the headless service has no IP and doesn't get written to that table. Very long story short I was able to work around it by changing val driverHostname = s"${driverService.getMetadata.getName}.$namespace.svc.cluster.local" in resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/submitsteps/DriverServiceBootstrapStep.scala to val driverHostname = s"${driverService.getMetadata.getName}" because the driver service name would also resolve to the IP of the driver pod, which is written to the routing table.

We can try this fix, but I'm doubtful it will be a fix as Romana still won't know how to talk to the node.

adelbertc commented 5 years ago

Superceded by https://github.com/apache/spark/pull/21884