Open gothub opened 3 years ago
Update - the k8s pods kube-scheduler
and kube-controller-manager
have crashed and been restarted 1300+ times in the last 60 days. Here is a sample from the kube-controller-manager
log:
...
goroutine 150 [sync.Cond.Wait]:
sync.runtime_notifyListWait(0xc0009fcf50, 0xc000000001)
/usr/local/go/src/runtime/sema.go:513 +0xf8
sync.(*Cond).Wait(0xc0009fcf40)
/usr/local/go/src/sync/cond.go:56 +0x99
k8s.io/kubernetes/vendor/k8s.io/client-go/util/workqueue.(*Type).Get(0xc000180b40, 0x0, 0x0, 0x3d96e00)
...
The frequency of the warning messages increases to about 10 every few seconds when even a light processing load (i.e. assessment report jobs, 10 workers) is running.
It may be that etcd
slowness may be causing the kube
containers to crash.
Typical kube-controller-manager log that occurs before restarting:
goroutine 956 [sync.Cond.Wait]:
sync.runtime_notifyListWait(0xc00133fad0, 0xc000000000)
/usr/local/go/src/runtime/sema.go:513 +0xf8
sync.(*Cond).Wait(0xc00133fac0)
/usr/local/go/src/sync/cond.go:56 +0x99
k8s.io/kubernetes/vendor/k8s.io/client-go/util/workqueue.(*Type).Get(0xc00091f2c0, 0x0, 0x0, 0x3d96e00)
/workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/client-go/util/workqueue/queue.go:151 +0x89
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/dynamiccertificates.(*DynamicCertKeyPairContent).processNextWorkItem(0xc000171260, 0x203000)
/workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/dynamiccertificates/dynamic_serving_content.go:153 +0x66
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/dynamiccertificates.(*DynamicCertKeyPairContent).runWorker(...)
/workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/dynamiccertificates/dynamic_serving_content.go:148
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc00088b390)
/workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x5f
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc00088b390, 0x500ac60, 0xc001752a20, 0x4b24b01, 0xc0004cb080)
/workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0x9b
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc00088b390, 0x3b9aca00, 0x0, 0x1, 0xc0004cb080)
/workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x98
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.Until(0xc00088b390, 0x3b9aca00, 0xc0004cb080)
/workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90 +0x4d
created by k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/dynamiccertificates.(*DynamicCertKeyPairContent).Run
/workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/dynamiccertificates/dynamic_serving_content.go:134 +0x28b
The kube-scheduler and kube-controller-manager restarts appear to be due to request timeouts. These services restart when a request they make times out, so that another replica can become the new leader. Even though we only have a one node control plane, these services will restart instead of attempting retry there request, as described in this issue: Kube-scheduler dies when connection to API server is lost
So this does not explain why the k8s api-server is not responding in time, but it does explain the restarts. It's possible that having a mult-node system could compensate for the underlying problem. The underlying problem has yet to be resolved.
Here are some relevant log entries:
0809 13:10:32.679984 1 leaderelection.go:325] error retrieving resource lock kube-system/kube-controller-manager: Get "https://128.111.85.190:6443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-controller-manager?timeout=5s": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
I
E0809 13:10:30.675181 1 leaderelection.go:361] Failed to update lock: Put "https://128.111.85.190:6443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-scheduler?timeout=5s": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
A test on k8s dev was made to attempt to resolve the restart issue, by increasing the etcd
config parameters located in /etc/kubernetes/manifest/etcd.yaml
. Here is a snippet from that file, where the last two lines here were added, which have the effect of increasing the heartbeat interval and leader election timeout:
spec:
containers:
- command:
- etcd
- --advertise-client-urls=https://128.111.85.190:2379
- --cert-file=/etc/kubernetes/pki/etcd/server.crt
- --client-cert-auth=true
- --data-dir=/var/lib/etcd
- --initial-advertise-peer-urls=https://128.111.85.190:2380
- --initial-cluster=docker-dev-ucsb-1=https://128.111.85.190:2380
- --key-file=/etc/kubernetes/pki/etcd/server.key
- --listen-client-urls=https://127.0.0.1:2379,https://128.111.85.190:2379
- --listen-metrics-urls=http://127.0.0.1:2381
- --listen-peer-urls=https://128.111.85.190:2380
- --name=docker-dev-ucsb-1
- --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
- --peer-client-cert-auth=true
- --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
- --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
- --snapshot-count=10000
- --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
- --heartbeat-interval=1000
- --election-timeout=10000
When this file is updated (i.e. via vi editor), etcd
will automatically be restarted. This change was made yesterday afternoon, and doesn't appear to have prevented kube
service restarts, as shown by a recent k8s status, as seen by the value in parenthesis (last restarted) under the RESTARTS
column:
kubectl get pods,services -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/calico-kube-controllers-78d6f96c7b-sld68 1/1 Running 646 120d 192.168.3.135 docker-dev-ucsb-1 <none> <none>
pod/calico-node-9jmnl 1/1 Running 174 (26d ago) 196d 128.111.85.191 docker-dev-ucsb-2 <none> <none>
pod/calico-node-fkvm8 1/1 Running 89 (47h ago) 196d 128.111.85.190 docker-dev-ucsb-1 <none> <none>
pod/coredns-78fcd69978-qktrm 1/1 Running 2 (25d ago) 120d 192.168.3.166 docker-dev-ucsb-1 <none> <none>
pod/coredns-78fcd69978-xjqjz 1/1 Running 2 (25d ago) 120d 192.168.3.144 docker-dev-ucsb-1 <none> <none>
pod/etcd-docker-dev-ucsb-1 1/1 Running 0 24h 128.111.85.190 docker-dev-ucsb-1 <none> <none>
pod/kube-apiserver-docker-dev-ucsb-1 1/1 Running 2 (4h21m ago) 47h 128.111.85.190 docker-dev-ucsb-1 <none> <none>
pod/kube-controller-manager-docker-dev-ucsb-1 1/1 Running 40 (3h46m ago) 47h 128.111.85.190 docker-dev-ucsb-1 <none> <none>
pod/kube-proxy-7hmdm 1/1 Running 1 (78d ago) 120d 128.111.85.191 docker-dev-ucsb-2 <none> <none>
pod/kube-proxy-x54mw 1/1 Running 2 (25d ago) 120d 128.111.85.190 docker-dev-ucsb-1 <none> <none>
pod/kube-scheduler-docker-dev-ucsb-1 1/1 Running 38 (3h46m ago) 47h 128.111.85.190 docker-dev-ucsb-1 <none> <none>
@gothub mentioned this on out weekly dev call today and I thought I'd drop a data point in from my single-node cluster running on another NCEAS VM:
mecum@neutral-cat:~$ k get -n kube-system pods
NAME READY STATUS RESTARTS AGE
coredns-78fcd69978-ml7sl 1/1 Running 1 (3d3h ago) 38d
coredns-78fcd69978-xq7qt 1/1 Running 2 38d
etcd-neutral-cat 1/1 Running 6885 (2d7h ago) 38d
kube-apiserver-neutral-cat 1/1 Running 229 38d
kube-controller-manager-neutral-cat 1/1 Running 629 (4h4m ago) 38d
kube-proxy-dpvtb 1/1 Running 1 (3d3h ago) 38d
kube-scheduler-neutral-cat 1/1 Running 631 (4h4m ago) 38d
weave-net-t6vtk 2/2 Running 3 (3d2h ago) 38d
More restarts than seems remotely reasonable. I see a lot of warnings from etcd about deadlocks and timeouts but none clearly indicate a restart. Haven't made a good effort at figuring it out though.
Similar problem from the interwebs: https://platform9.com/kb/kubernetes/excessive-kubernetes-master-pod-restarts They say:
Cause
- Etcd heartbeats are timing out, resulting in frequent leader elections.
- The kube-controller-manager and kube-scheduler container logs show etcd read timeouts due to the leader elections, resulting in the restart of these containers.
Resolution
- The default values of heartbeat-interval and election-timeout are 100ms and 1000ms, respectively. For Azure, we've had to increase these values to 1000ms and 10000ms and these defaults are included in Platform9 Managed Kubernetes v4.1+.
Good find. If I grep my etcd logs for "error", I get:
WARNING: 2021/12/16 22:45:27 [core] grpc: Server.processUnaryRPC failed to write status: connection error: desc = "transport is closing"
{"level":"warn","ts":"2021-12-16T22:45:50.129Z","caller":"etcdserver/util.go:166","msg":"apply request took too long","took":"2.000535152s","expected-duration":"100ms","prefix":"read-only range ","request":"key:\"/registry/health\" ","response":"","error":"context deadline exceeded"}
The dev k8s cluster is experiencing k8s system errors, with the kube-controller, kube-scheduler and etcd server experiencing many errors and restarts. Here is the pod, service info:
Here is a sample from the etcd log:
... and from kube-scheduler:
I'm not seeing error messages from kube-controller.
docker-dev-ucsb-1 has adequate disk space and the system is running at a load average of "0.82 0.92 1.05"
Note that the production cluster is not experiencing these errors or restarts.