Closed adelbertc closed 5 years ago
We have never run into this issue as reported here. @foxish.
I imagine if a K8s cluster is secured with a network policy or something pods are not exposed by default which would make this change necessary.
@liyinan926 Just to confirm that we've seen a very similar issue with specific network overlays - Romana - but not encountered it with others
@rvesse are you able to test this patch and see if it fixes your issue?
@liyinan926 We have a different patch internally which fixes our issue, not sure if it is the exact same problem though clearly closely related. The developer who did the debugging and patch is currently out on jury duty until mid next week so I am waiting for her to get back and take a look at this
Can this be posted against apache/spark instead?
+1, please post the fix against the upstream repo. We'll backport it here at some point.
On Wed, Mar 7, 2018 at 1:13 PM mccheah notifications@github.com wrote:
Can this be posted against apache/spark instead?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/apache-spark-on-k8s/spark/pull/618#issuecomment-371286574, or mute the thread https://github.com/notifications/unsubscribe-auth/AA3U57zyIYJaDzqkHvo9wRh3L5ef1N6kks5tcE1vgaJpZM4SV42C .
-- Anirudh Ramanathan
I'm the developer that works with rvesse. We have the exact same stacktrace, but I think our underlying problem is different. As far as we could figure out, the problem was coming from that Romana relies on entries in the routing table to route traffic from 10. to 192. and the headless service has no IP and doesn't get written to that table. Very long story short I was able to work around it by changing
val driverHostname = s"${driverService.getMetadata.getName}.$namespace.svc.cluster.local"
in resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/submitsteps/DriverServiceBootstrapStep.scala to
val driverHostname = s"${driverService.getMetadata.getName}"
because the driver service name would also resolve to the IP of the driver pod, which is written to the routing table.
We can try this fix, but I'm doubtful it will be a fix as Romana still won't know how to talk to the node.
Superceded by https://github.com/apache/spark/pull/21884
What changes were proposed in this pull request?
Expose ports explicitly in the driver container. The driver Service created expects to reach the driver Pod at specific ports which before this change, were not explicitly exposed and would likely cause connection issues (see #617).
How was this patch tested?
Failure in #617 reproduced on Kubernetes 1.6.x and 1.8.x. Built the driver image with this patch and observed fixed #617 on Kubernetes 1.6.x.