Open oba11 opened 7 years ago
Hi, I just run into the very same issue. I have applied your workaround, 12h passed and so far everything is running fine. Thanks a milion for your effort and awesome debugging skills!
Btw, I'd like to understand the root cause. The garbage collector service somehow brake flanneld, which in turn kill all pods on a node? How is it fixed by restarting kubelet? It reloads also flanneld? I just started digging into it…
I can trigger this issue at will, even with this fix enabled. More information at https://github.com/kubernetes/kubernetes/issues/49835
I'm observing similar problems, with Container Linux 1465.6.0
on a KVM environment, Kubernetes coreos/hyperkube:v1.6.4_coreos.0
, Flannel coreos/flannel:v0.7.1
. I see that at 12 hours, the rkt garbage collector kicks in, and the end state is that all non-/pause
Docker containers get repeatedly terminated.
However one config difference is that my kubelet.service
does not specify Requires=
or After=
flanneld.service
, and does not have ExecStartPre=/usr/bin/systemctl is-active flanneld.service
.
State of the system while things are stuck:
# cat /etc/os-release
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1465.6.0
VERSION_ID=1465.6.0
BUILD_ID=2017-08-16-0012
PRETTY_NAME="Container Linux by CoreOS 1465.6.0 (Ladybug)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://issues.coreos.com"
COREOS_BOARD="amd64-usr"
# systemctl list-timers --all
NEXT LEFT LAST PASSED UNIT ACTIVATES
Fri 2017-09-08 16:16:06 SGT 1h 27min left Fri 2017-09-08 04:16:06 SGT 10h ago rkt-gc.timer rkt-gc.service
Fri 2017-09-08 16:31:06 SGT 1h 42min left Thu 2017-09-07 16:31:06 SGT 22h ago systemd-tmpfiles-clean.timer systemd-tmpfiles-clean.service
Sat 2017-09-09 00:00:00 SGT 9h left Fri 2017-09-08 00:00:12 SGT 14h ago logrotate.timer logrotate.service
n/a n/a n/a n/a update-engine-stub.timer update-engine-stub.service
4 timers listed.
# rkt list --full
UUID APP IMAGE NAME IMAGE ID STATE CREATED STARTED NETWORKS
15c22907-7307-4270-a260-c2fbd4b60f3d hyperkube quay.io/coreos/hyperkube:v1.6.4_coreos.0 sha512-3b5de58b2f2d exited garbage 2017-09-07 16:09:45 +0800 SGT 2017-09-07 16:09:45 +0800 SGT
4ee5856a-a8f8-4a61-8c5d-305382f2e888 hyperkube quay.io/coreos/hyperkube:v1.6.4_coreos.0 sha512-3b5de58b2f2d running 2017-09-07 16:16:06 +0800 SGT 2017-09-07 16:16:06 +0800 SGT
929aa77f-0e48-4c57-8ee6-d1a86164911a flannel quay.io/coreos/flannel:v0.7.1 sha512-f5a1ca0ed2aa running 2017-09-07 16:16:08 +0800 SGT 2017-09-07 16:16:08 +0800 SGT
95b6db3e-cb76-4bf2-aec4-a8e99cccb74d flannel quay.io/coreos/flannel:v0.7.1 sha512-f5a1ca0ed2aa exited garbage 2017-09-07 16:16:09 +0800 SGT 2017-09-07 16:16:09 +0800 SGT
c6233d72-6b24-4cf9-a39f-d42d9f69503b etcd quay.io/coreos/etcd:v3.1.8 sha512-b3659e21a58d running 2017-09-07 16:16:06 +0800 SGT 2017-09-07 16:16:06 +0800 SGT
Also, I do not see many QoS related errors, only this happens very infrequently:
W0906 20:35:12.290371 737 qos_container_manager_linux.go:133] [ContainerManager] Failed to reserve QoS requests: failed to set supported cgroup subsystems for cgroup /kubepods/burstable: Failed to set config for supported subsystems : failed to write 2032 to cpu.shares: open /var/lib/rkt/pods/run/53f9dbbf-c59e-40df-8baa-d9387f09f0c0/stage1/rootfs/opt/stage2/flannel/rootfs/sys/fs/cgroup/cpu,cpuacct/kubepods/burstable/cpu.shares: no such file or directory
Update: I'm able to replicate this by rebooting my Container Linux nodes, waiting for them to come up, and then running rkt gc
. This merely moves the exited
flannel node to exited garbage
, without actually removing it as the grace period of 12h hasn't been reached, but that's enough to trigger all the Docker containers to start crashing continuously. Trying to do rkt gc --grace-period=0s
at this point fails.
If I just systemctl restart flanneld
at this point, nothing gets fixed. After systemctl restart kubelet
, then the Docker containers come back up (manage to stay up?) and then rkt gc --grace-period=0s
can complete.
When I rollback Container Linux to 1353.7.0
(these nodes were recently upgraded from that to 1465.6.0
, just before 1465.7.0
was released), I can't replicate this problem using the above steps. I would try to bisect the stable releases to determine when this started, but it seems extremely inconvenient to reverse-engineer the update_engine just to do that.
I can confirm this exact issue exists in CoreOS 1409.7.0 as well
After some further testing, I think our case is a bit different from @oba11. We run kubelet inside rkt, and mount /var/lib/rkt
into the kubelet rkt pod.
The workaround we found was to remove that mount (it's unnecessary since we're using --container-runtime=docker
) then the rkt gc
to move the exited flannel-docker-opts rkt pod from exited
to exited garbage
can complete normally without causing the CrashLoop.
I found the likely root cause of this and I've updated kubernetes/kubernetes#49835 .
I'm hopeful that https://github.com/google/cadvisor/pull/1792 will fix this. I haven't done manual testing yet to be more confident.
Still experiencing the same issue as @oba11 with the follwoing environment:
staging-worker-0 Ready 3m v1.9.0 192.168.2.21 Container Linux by CoreOS 1520.9.0 (Ladybug) 4.13.16-coreos-r1
the workaround of deleting the exited rkt pods created by flannel-docker-opts.service seem to work but I was wondering if there has been a proper upstream fix for this?
Issue Report
Bug
Container Linux Version
Environment
Kubernetes Version: 1.6.3 AWS HVM
Expected Behavior
Infinite running of kubernetes pods while relying on flannel as the network backend.
Actual Behavior
After around 12hours, noticed that all the pods on the node would end up in a crashloop. After couple of days of debugging, noticed that it was caused by the rkt garbage collector timer.
Not sure why but noticed that the rkt pod created by
flannel-docker-opts.service
only stops after successful execution.Even though I'm using docker as container backend, kubelet tries to cleanup the rkt pods which was unsuccessful but at exactly 12hrs mark after the node was started, all containers on the kubelet starts ending up in crashloop until the kubelet is restarted.
A hack fix was do delete the exited rkt pods created by
flannel-docker-opts.service
and the pods on kubelet has been running without hiccup for a while nowReproduction Steps
Start kubelet with after flannel with the below option
Other Information
Temporary fix was to add
ExecStop=-/usr/bin/rkt rm --uuid-file=/var/lib/coreos/flannel-wrapper2.uuid
as dropin toflannel-docker-opts.service