Azure / AKS

Azure Kubernetes Service
https://azure.github.io/AKS/
1.95k stars 305 forks source link

Service Catalog API server pod fails to connect to etcd #952

Closed dewnull closed 5 years ago

dewnull commented 5 years ago

What happened: The Service Catalog on AKS was working okay until last thursday when the API Server failed. It looks like it can't connect to the etcd pod.

I0508 01:12:22.768515       1 round_trippers.go:438] GET https://azcluster0-azcluster05-rg-81a3c3-0f0c1a12.hcp.centralus.azmk8s.io:443/apis/admissionregistration.k8s.io/v1beta1/validatingwebhookconfigurations?resourceVersion=49037864&timeout=5m8s&timeoutSeconds=308&watch=true 200 OK in 1 milliseconds
I0508 01:12:22.768551       1 round_trippers.go:444] Response Headers:
I0508 01:12:22.768559       1 round_trippers.go:447]     Content-Type: application/json
I0508 01:12:22.768564       1 round_trippers.go:447]     Date: Wed, 08 May 2019 01:12:22 GMT
I0508 01:12:22.770551       1 handler.go:143] service-catalog-apiserver: GET "/apis/servicecatalog.k8s.io/v1beta1/clusterserviceclasses" satisfied by gorestful with webservice /apis/servicecatalog.k8s.io/v1beta1
I0508 01:12:22.771104       1 handler.go:143] service-catalog-apiserver: GET "/apis/servicecatalog.k8s.io/v1beta1/serviceinstances" satisfied by gorestful with webservice /apis/servicecatalog.k8s.io/v1beta1
I0508 01:12:22.866746       1 handler.go:143] service-catalog-apiserver: GET "/apis/servicecatalog.k8s.io/v1beta1/clusterserviceplans" satisfied by gorestful with webservice /apis/servicecatalog.k8s.io/v1beta1
I0508 01:12:23.694979       1 log.go:30] transport: loopyWriter.run returning. connection error: desc = "transport is closing"
I0508 01:12:23.695058       1 log.go:30] transport: loopyWriter.run returning. connection error: desc = "transport is closing"
I0508 01:12:23.694979       1 log.go:30] transport: loopyWriter.run returning. connection error: desc = "transport is closing"
I0508 01:12:23.695085       1 log.go:30] transport: loopyWriter.run returning. connection error: desc = "transport is closing"
I0508 01:12:23.695116       1 health_balancer.go:423] clientv3/balancer: "localhost:2379" is marked unhealthy ("grpc: the connection is closing")
I0508 01:12:23.695133       1 balancer_v1_wrapper.go:248] clientv3/balancer: unpin "localhost:2379" ("grpc: the connection is closing")
I0508 01:12:23.695164       1 health_balancer.go:423] clientv3/balancer: "localhost:2379" is marked unhealthy ("grpc: the connection is closing")
I0508 01:12:23.695172       1 balancer_v1_wrapper.go:125] balancerWrapper: got update addr from Notify: [{localhost:2379 <nil>}]
I0508 01:12:23.695196       1 balancer_v1_wrapper.go:248] clientv3/balancer: unpin "localhost:2379" ("grpc: the connection is closing")
I0508 01:12:23.695232       1 balancer_v1_wrapper.go:125] balancerWrapper: got update addr from Notify: [{localhost:2379 <nil>}]
I0508 01:12:23.695190       1 health_balancer.go:423] clientv3/balancer: "localhost:2379" is marked unhealthy ("grpc: the connection is closing")
I0508 01:12:23.695265       1 balancer_v1_wrapper.go:248] clientv3/balancer: unpin "localhost:2379" ("grpc: the connection is closing")
I0508 01:12:23.695265       1 balancer_v1_wrapper.go:125] balancerWrapper: got update addr from Notify: [{localhost:2379 <nil>}]
I0508 01:12:23.695278       1 health_balancer.go:423] clientv3/balancer: "localhost:2379" is marked unhealthy ("grpc: the connection is closing")
I0508 01:12:23.695285       1 balancer_v1_wrapper.go:125] balancerWrapper: got update addr from Notify: [{localhost:2379 <nil>}]
I0508 01:12:23.695288       1 balancer_v1_wrapper.go:248] clientv3/balancer: unpin "localhost:2379" ("grpc: the connection is closing")
I0508 01:12:23.694994       1 retry.go:117] clientv3/retry: error "rpc error: code = Unavailable desc = transport is closing" on pinned endpoint "localhost:2379"
I0508 01:12:23.695027       1 log.go:30] transport: loopyWriter.run returning. connection error: desc = "transport is closing"
I0508 01:12:23.695307       1 retry.go:101] clientv3/balancer: "localhost:2379" is marked unhealthy ("rpc error: code = Unavailable desc = transport is closing")
I0508 01:12:23.695308       1 balancer_v1_wrapper.go:125] balancerWrapper: got update addr from Notify: [{localhost:2379 <nil>}]
I0508 01:12:23.695322       1 balancer_v1_wrapper.go:125] balancerWrapper: got update addr from Notify: [{localhost:2379 <nil>}]
I0508 01:12:23.695341       1 balancer_v1_wrapper.go:125] balancerWrapper: got update addr from Notify: [{localhost:2379 <nil>}]
I0508 01:12:23.695013       1 retry.go:117] clientv3/retry: error "rpc error: code = Unavailable desc = transport is closing" on pinned endpoint "localhost:2379"
I0508 01:12:23.695351       1 balancer_v1_wrapper.go:125] balancerWrapper: got update addr from Notify: [{localhost:2379 <nil>}]
I0508 01:12:23.695353       1 balancer_v1_wrapper.go:125] balancerWrapper: got update addr from Notify: []
I0508 01:12:23.695053       1 health_balancer.go:423] clientv3/balancer: "localhost:2379" is marked unhealthy ("grpc: the connection is closing")
I0508 01:12:23.695368       1 retry.go:117] clientv3/retry: switching from "localhost:2379" due to error "rpc error: code = Unavailable desc = transport is closing"
I0508 01:12:23.695370       1 retry.go:101] clientv3/balancer: "localhost:2379" is marked unhealthy ("rpc error: code = Unavailable desc = transport is closing")
I0508 01:12:23.695541       1 retry.go:117] clientv3/retry: switching from "localhost:2379" due to error "rpc error: code = Unavailable desc = transport is closing"
I0508 01:12:23.695555       1 balancer_v1_wrapper.go:125] balancerWrapper: got update addr from Notify: []
I0508 01:12:23.694979       1 log.go:30] transport: loopyWriter.run returning. connection error: desc = "transport is closing"
I0508 01:12:23.695373       1 balancer_v1_wrapper.go:125] balancerWrapper: got update addr from Notify: [{localhost:2379 <nil>}]
I0508 01:12:23.695612       1 balancer_v1_wrapper.go:125] balancerWrapper: got update addr from Notify: [{localhost:2379 <nil>}]
I0508 01:12:23.695642       1 balancer_v1_wrapper.go:125] balancerWrapper: got update addr from Notify: [{localhost:2379 <nil>}]
W0508 01:12:23.695664       1 clientconn.go:1440] grpc: addrConn.transportMonitor exits due to: context canceled
I0508 01:12:23.695037       1 health_balancer.go:423] clientv3/balancer: "localhost:2379" is marked unhealthy ("grpc: the connection is closing")
I0508 01:12:23.695683       1 balancer_v1_wrapper.go:248] clientv3/balancer: unpin "localhost:2379" ("grpc: the connection is closing")
I0508 01:12:23.695684       1 balancer_v1_wrapper.go:125] balancerWrapper: got update addr from Notify: [{localhost:2379 <nil>}]
I0508 01:12:23.695416       1 balancer_v1_wrapper.go:125] balancerWrapper: got update addr from Notify: [{localhost:2379 <nil>}]
I0508 01:12:23.695056       1 log.go:30] transport: loopyWriter.run returning. connection error: desc = "transport is closing"
I0508 01:12:23.695417       1 health_balancer.go:423] clientv3/balancer: "localhost:2379" is marked unhealthy ("grpc: the connection is closing")
I0508 01:12:23.695385       1 balancer_v1_wrapper.go:248] clientv3/balancer: unpin "localhost:2379" ("grpc: the connection is closing")
I0508 01:12:23.695742       1 balancer_v1_wrapper.go:248] clientv3/balancer: unpin "localhost:2379" ("grpc: the connection is closing")
I0508 01:12:23.695757       1 balancer_v1_wrapper.go:125] balancerWrapper: got update addr from Notify: [{localhost:2379 <nil>}]
I0508 01:12:23.695731       1 balancer_v1_wrapper.go:125] balancerWrapper: got update addr from Notify: [{localhost:2379 <nil>}]
I0508 01:12:23.695772       1 balancer_v1_wrapper.go:125] balancerWrapper: got update addr from Notify: [{localhost:2379 <nil>}]
I0508 01:12:23.695782       1 balancer_v1_wrapper.go:125] balancerWrapper: got update addr from Notify: [{localhost:2379 <nil>}]
I0508 01:12:23.695440       1 health_balancer.go:423] clientv3/balancer: "localhost:2379" is marked unhealthy ("grpc: the connection is closing")
I0508 01:12:23.695802       1 balancer_v1_wrapper.go:125] balancerWrapper: got update addr from Notify: [{localhost:2379 <nil>}]
I0508 01:12:23.695810       1 balancer_v1_wrapper.go:248] clientv3/balancer: unpin "localhost:2379" ("grpc: the connection is closing")
I0508 01:12:23.695827       1 balancer_v1_wrapper.go:125] balancerWrapper: got update addr from Notify: [{localhost:2379 <nil>}]
I0508 01:12:23.695869       1 balancer_v1_wrapper.go:125] balancerWrapper: got update addr from Notify: [{localhost:2379 <nil>}]
W0508 01:12:23.766271       1 clientconn.go:1440] grpc: addrConn.transportMonitor exits due to: context canceled
I0508 01:12:23.766315       1 log.go:30] transport: loopyWriter.run returning. connection error: desc = "transport is closing"
I0508 01:12:23.766398       1 log.go:30] transport: loopyWriter.run returning. connection error: desc = "transport is closing"
I0508 01:12:23.766437       1 retry.go:117] clientv3/retry: error "rpc error: code = Unavailable desc = transport is closing" on pinned endpoint "localhost:2379"
I0508 01:12:23.766455       1 retry.go:101] clientv3/balancer: "localhost:2379" is marked unhealthy ("rpc error: code = Unavailable desc = transport is closing")
I0508 01:12:23.766474       1 balancer_v1_wrapper.go:125] balancerWrapper: got update addr from Notify: []
I0508 01:12:23.766442       1 health_balancer.go:423] clientv3/balancer: "localhost:2379" is marked unhealthy ("grpc: the connection is closing")
I0508 01:12:23.766531       1 balancer_v1_wrapper.go:248] clientv3/balancer: unpin "localhost:2379" ("grpc: the connection is closing")
I0508 01:12:23.766561       1 balancer_v1_wrapper.go:125] balancerWrapper: got update addr from Notify: [{localhost:2379 <nil>}]
I0508 01:12:23.766563       1 retry.go:117] clientv3/retry: switching from "localhost:2379" due to error "rpc error: code = Unavailable desc = transport is closing"
I0508 01:12:23.766615       1 balancer_v1_wrapper.go:125] balancerWrapper: got update addr from Notify: [{localhost:2379 <nil>}]
W0508 01:12:23.771948       1 clientconn.go:1440] grpc: addrConn.transportMonitor exits due to: grpc: the connection is closing
W0508 01:12:23.771955       1 clientconn.go:1304] grpc: addrConn.createTransport failed to connect to {localhost:2379 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0508 01:12:23.771988       1 clientconn.go:1304] grpc: addrConn.createTransport failed to connect to {localhost:2379 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0508 01:12:23.771988       1 clientconn.go:1304] grpc: addrConn.createTransport failed to connect to {localhost:2379 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0508 01:12:23.772031       1 clientconn.go:1304] grpc: addrConn.createTransport failed to connect to {localhost:2379 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0508 01:12:23.772062       1 clientconn.go:1304] grpc: addrConn.createTransport failed to connect to {localhost:2379 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0508 01:12:23.772101       1 clientconn.go:1304] grpc: addrConn.createTransport failed to connect to {localhost:2379 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0508 01:12:23.772101       1 clientconn.go:1304] grpc: addrConn.createTransport failed to connect to {localhost:2379 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0508 01:12:23.772117       1 clientconn.go:1304] grpc: addrConn.createTransport failed to connect to {localhost:2379 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0508 01:12:23.772132       1 clientconn.go:1304] grpc: addrConn.createTransport failed to connect to {localhost:2379 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0508 01:12:24.768996       1 clientconn.go:1304] grpc: addrConn.createTransport failed to connect to {localhost:2379 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0508 01:12:24.769058       1 clientconn.go:1304] grpc: addrConn.createTransport failed to connect to {localhost:2379 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0508 01:12:24.769074       1 clientconn.go:1304] grpc: addrConn.createTransport failed to connect to {localhost:2379 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0508 01:12:25.269756       1 clientconn.go:1304] grpc: addrConn.createTransport failed to connect to {localhost:2379 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0508 01:12:25.344398       1 clientconn.go:1304] grpc: addrConn.createTransport failed to connect to {localhost:2379 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0508 01:12:25.362840       1 clientconn.go:1304] grpc: addrConn.createTransport failed to connect to {localhost:2379 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0508 01:12:25.462942       1 clientconn.go:1304] grpc: addrConn.createTransport failed to connect to {localhost:2379 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0508 01:12:25.571419       1 clientconn.go:1304] grpc: addrConn.createTransport failed to connect to {localhost:2379 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0508 01:12:25.686567       1 clientconn.go:1304] grpc: addrConn.createTransport failed to connect to {localhost:2379 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
I0508 01:12:26.267881       1 balancer_v1_wrapper.go:245] clientv3/balancer: pin "localhost:2379"
I0508 01:12:26.267931       1 balancer_v1_wrapper.go:125] balancerWrapper: got update addr from Notify: [{localhost:2379 <nil>}]
I0508 01:12:26.297905       1 balancer_v1_wrapper.go:245] clientv3/balancer: pin "localhost:2379"
I0508 01:12:26.298006       1 balancer_v1_wrapper.go:125] balancerWrapper: got update addr from Notify: [{localhost:2379 <nil>}]
I0508 01:12:26.568619       1 balancer_v1_wrapper.go:245] clientv3/balancer: pin "localhost:2379"
I0508 01:12:26.568700       1 balancer_v1_wrapper.go:125] balancerWrapper: got update addr from Notify: [{localhost:2379 <nil>}]
I0508 01:12:27.367907       1 balancer_v1_wrapper.go:245] clientv3/balancer: pin "localhost:2379"
I0508 01:12:27.367945       1 balancer_v1_wrapper.go:125] balancerWrapper: got update addr from Notify: [{localhost:2379 <nil>}]
I0508 01:12:27.667551       1 balancer_v1_wrapper.go:245] clientv3/balancer: pin "localhost:2379"
I0508 01:12:27.667596       1 balancer_v1_wrapper.go:125] balancerWrapper: got update addr from Notify: [{localhost:2379 <nil>}]
I0508 01:12:27.887937       1 balancer_v1_wrapper.go:245] clientv3/balancer: pin "localhost:2379"
I0508 01:12:27.887977       1 balancer_v1_wrapper.go:125] balancerWrapper: got update addr from Notify: [{localhost:2379 <nil>}]
I0508 01:12:28.131226       1 balancer_v1_wrapper.go:245] clientv3/balancer: pin "localhost:2379"
I0508 01:12:28.131266       1 balancer_v1_wrapper.go:125] balancerWrapper: got update addr from Notify: [{localhost:2379 <nil>}]
I0508 01:12:28.176365       1 balancer_v1_wrapper.go:245] clientv3/balancer: pin "localhost:2379"
I0508 01:12:28.176399       1 balancer_v1_wrapper.go:125] balancerWrapper: got update addr from Notify: [{localhost:2379 <nil>}]
I0508 01:12:28.569366       1 balancer_v1_wrapper.go:245] clientv3/balancer: pin "localhost:2379"
I0508 01:12:28.569401       1 balancer_v1_wrapper.go:125] balancerWrapper: got update addr from Notify: [{localhost:2379 <nil>}]
I0508 01:12:35.194899       1 retry.go:117] clientv3/retry: error "rpc error: code = Unavailable desc = transport is closing" on pinned endpoint "localhost:2379"
I0508 01:12:35.194938       1 log.go:30] transport: loopyWriter.run returning. connection error: desc = "transport is closing"
I0508 01:12:35.194947       1 retry.go:101] clientv3/balancer: "localhost:2379" is marked unhealthy ("rpc error: code = Unavailable desc = transport is closing")
I0508 01:12:35.194969       1 balancer_v1_wrapper.go:125] balancerWrapper: got update addr from Notify: []
I0508 01:12:35.194974       1 log.go:30] transport: loopyWriter.run returning. connection error: desc = "transport is closing"
I0508 01:12:35.194993       1 health_balancer.go:423] clientv3/balancer: "localhost:2379" is marked unhealthy ("grpc: the connection is closing")

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

dewnull commented 5 years ago

https://github.com/kubernetes-incubator/service-catalog/issues/2521