apache-spark-on-k8s / spark

Apache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apache/spark/
https://spark.apache.org/
Apache License 2.0
612 stars 118 forks source link

WatchConnectionManager closes when k8s master restarts #588

Open duyanghao opened 6 years ago

duyanghao commented 6 years ago

WatchConnectionManager will always close itself when I restart k8s api server.How will it affect spark running application?

step1: kill k8s api server + scheduler + controller_manager step2: stop for a moment step3: restart the k8s api server + scheduler + controller_manager

the WatchConnectionManager will always close itself even though it has reconnected successfully as below:

2018-01-02 11:32:26 DEBUG WatchConnectionManager: Connecting websocket ... io.fabric8.kubernetes.client.dsl.internal.WatchConnectionMan
ager@1ba05e38
2018-01-02 11:32:26 DEBUG WatchConnectionManager: WebSocket successfully opened
2018-01-02 11:32:26 DEBUG KubernetesClusterSchedulerBackend: Executor pod watch closed.
io.fabric8.kubernetes.client.KubernetesClientException: too old resource version: 74322460 (74322904)
        at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$2.onMessage(WatchConnectionManager.java:226)
        at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:307)
        at okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:222)
        at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:101)
        at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:262)
        at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:201)
        at okhttp3.RealCall$AsyncCall.execute(RealCall.java:135)
        at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
2018-01-02 11:32:26 DEBUG WatchConnectionManager: Force closing the watch io.fabric8.kubernetes.client.dsl.internal.WatchConnectionMana
ger@1ba05e38
2018-01-02 11:32:26 DEBUG WatchConnectionManager: Ignoring duplicate firing of onClose event
2018-01-02 11:32:26 DEBUG WatchConnectionManager: WebSocket close received. code: 1000, reason:
2018-01-02 11:32:26 DEBUG WatchConnectionManager: Ignoring onClose for already closed/closing websocket

I am afraid that the close of WatchConnectionManager will affect spark running application.And it has now confirmed the close of Executor pod watch.