coreos / container-linux-update-operator

A Kubernetes operator to manage updates of Container Linux by CoreOS
Apache License 2.0
209 stars 49 forks source link

Stuck on fail loop for master node during container linux update #174

Closed ghost closed 6 years ago

ghost commented 6 years ago

After an automatic container linux update happened (not sure when and this behavior is troublesome) One of the two master nodes in my cluster is stuck on a kubelet fail loop.

-- Logs begin at Sun 2018-01-28 20:24:12 UTC. -- Feb 13 21:23:55 ip-10-3-90-187 systemd[1]: kubelet.service: Failed with result 'timeout'. Feb 13 21:23:55 ip-10-3-90-187 systemd[1]: Failed to start Kubelet via Hyperkube ACI. Feb 13 21:24:05 ip-10-3-90-187 systemd[1]: kubelet.service: Service hold-off time over, scheduling restart. Feb 13 21:24:05 ip-10-3-90-187 systemd[1]: kubelet.service: Scheduled restart job, restart counter is at 25. Feb 13 21:24:05 ip-10-3-90-187 systemd[1]: Stopped Kubelet via Hyperkube ACI. Feb 13 21:24:05 ip-10-3-90-187 systemd[1]: Starting Kubelet via Hyperkube ACI... Feb 13 21:24:17 ip-10-3-90-187 rkt[1715]: rm: cannot get pod: no matches found for "226a1548-0f24-40cf-90fe-c823d24eefd3" Feb 13 21:24:17 ip-10-3-90-187 rkt[1715]: rm: failed to remove one or more pods Feb 13 21:24:16 ip-10-3-90-187 systemd[1]: Stopped Kubelet via Hyperkube ACI. -- Reboot -- Feb 13 21:26:16 localhost systemd[1]: Starting Kubelet via Hyperkube ACI... Feb 13 21:27:47 ip-10-3-90-187 systemd[1]: kubelet.service: Start-pre operation timed out. Terminating. Feb 13 21:27:47 ip-10-3-90-187 systemd[1]: kubelet.service: Failed with result 'timeout'. Feb 13 21:27:47 ip-10-3-90-187 systemd[1]: Failed to start Kubelet via Hyperkube ACI. Feb 13 21:27:57 ip-10-3-90-187 systemd[1]: kubelet.service: Service hold-off time over, scheduling restart. Feb 13 21:27:57 ip-10-3-90-187 systemd[1]: kubelet.service: Scheduled restart job, restart counter is at 1. Feb 13 21:27:57 ip-10-3-90-187 systemd[1]: Stopped Kubelet via Hyperkube ACI. Feb 13 21:27:57 ip-10-3-90-187 systemd[1]: Starting Kubelet via Hyperkube ACI... Feb 13 21:29:28 ip-10-3-90-187 systemd[1]: kubelet.service: Start-pre operation timed out. Terminating. Feb 13 21:29:28 ip-10-3-90-187 systemd[1]: kubelet.service: Failed with result 'timeout'. Feb 13 21:29:28 ip-10-3-90-187 systemd[1]: Failed to start Kubelet via Hyperkube ACI. Feb 13 21:29:38 ip-10-3-90-187 systemd[1]: kubelet.service: Service hold-off time over, scheduling restart. Feb 13 21:29:38 ip-10-3-90-187 systemd[1]: kubelet.service: Scheduled restart job, restart counter is at 2. Feb 13 21:29:38 ip-10-3-90-187 systemd[1]: Stopped Kubelet via Hyperkube ACI. Feb 13 21:29:38 ip-10-3-90-187 systemd[1]: Starting Kubelet via Hyperkube ACI...

image

image

dghubble commented 6 years ago

Its best to file this on Tectonic as it uses its own variant of CLUO and has its own support structure in place.

sdemos commented 6 years ago

In addition to the tectonic-cluo differences, this doesn't seem like an issue with CLUO. At first blush, it seems like an issue with Tectonic/Container Linux interop. I'm going to go ahead and close this issue out.

Please feel free to reopen this bug if it turns out to be an issue with CLUO itself!