canonical / microk8s

MicroK8s is a small, fast, single-package Kubernetes for datacenters and the edge.
https://microk8s.io
Apache License 2.0
8.42k stars 769 forks source link

Microk8s restart during snap refresh did not remove pods #4691

Open pengwyn opened 2 days ago

pengwyn commented 2 days ago

Summary

An autorefresh of the microk8s snap broke the cluster, as it was unable to shutdown pods during the stopping of the service. This meant that once it started again, microk8s created many duplicate pods which stepped on each others toes for both emptyDir mounts and opened host ports.

This has happened to us twice on two different microk8s deployments. In the first case it was on a single-node microk8s, and the second was a 3-node cluster.

What Should Happen Instead?

Microk8s should cleanly stop pods. In the event of unforeseen problems, maybe some better resolution for handling left-over processes. To be honest I don't know what this could be, seems like a tricky situation to recover from.

Reproduction Steps

I'm not sure how to reproduce, or even what the cause is. In one deployment, the snap updated to version v1.28.13, although it is long enough ago I can't say from which version. In the other deployment, the snap updated to version v1.28.14. I believe this second deployment also autorefreshed to 1.28.13 at some point without trouble, so the problem seems to be dependant on external factors (maybe networking?)

Introspection Report

There are various messages in the syslog around the time of the refresh which indicate a problem. I will provide a sample of messages here (these are for the refresh of the single-node deployment to 1.28.13):

Sep  5 04:50:39 juju-657122-30 systemd[1]: Stopping Service for snap application microk8s.daemon-cluster-agent...
Sep  5 04:50:39 juju-657122-30 systemd[1]: snap.microk8s.daemon-cluster-agent.service: Deactivated successfully.
Sep  5 04:50:39 juju-657122-30 systemd[1]: Stopped Service for snap application microk8s.daemon-cluster-agent.
Sep  5 04:50:39 juju-657122-30 systemd[1]: snap.microk8s.daemon-cluster-agent.service: Consumed 2min 5.509s CPU time.
Sep  5 04:50:39 juju-657122-30 systemd[1]: Stopping Service for snap application microk8s.daemon-k8s-dqlite...
Sep  5 04:50:39 juju-657122-30 microk8s.daemon-k8s-dqlite[1510273]: time="2024-09-05T04:50:39Z" level=info msg="Shutting down\n"
Sep  5 04:50:39 juju-657122-30 microk8s.daemon-k8s-dqlite[1510273]: time="2024-09-05T04:50:39Z" level=info msg="Dqlite stopped\n"
Sep  5 04:50:39 juju-657122-30 microk8s.daemon-kubelite[1510288]: W0905 04:50:39.552739 1510288 logging.go:59] [core] [Channel #269 SubChannel #270] grpc: addrConn.createTransport failed to connect to {Addr: "unix:///var/snap/microk8s/7150/var/kubernetes/backend/kine.sock:12379", ServerName: "kine.sock", }. Err: connection error: desc = "transport: Error while dialing: dial unix /var/snap/microk8s/7150/var/kubernetes/backend/kine.sock:12379: connect: connection refused"
...

There are 100+ repeats of the connection refused message. After these we have:

Sep  5 04:50:39 juju-657122-30 systemd[1]: Stopping Service for snap application microk8s.daemon-apiserver-kicker...
Sep  5 04:50:39 juju-657122-30 systemd[1]: snap.microk8s.daemon-apiserver-kicker.service: Deactivated successfully.
Sep  5 04:50:39 juju-657122-30 systemd[1]: Stopped Service for snap application microk8s.daemon-apiserver-kicker.
Sep  5 04:50:39 juju-657122-30 systemd[1]: snap.microk8s.daemon-apiserver-kicker.service: Consumed 15h 25min 1.291s CPU time.
Sep  5 04:50:39 juju-657122-30 systemd[1]: Stopping Service for snap application microk8s.daemon-kubelite...
Sep  5 04:50:39 juju-657122-30 microk8s.daemon-kubelite[1510288]: I0905 04:50:39.628298 1510288 secure_serving.go:258] Stopped listening on [::]:10259
Sep  5 04:50:39 juju-657122-30 microk8s.daemon-kubelite[1510288]: I0905 04:50:39.630582 1510288 tlsconfig.go:255] "Shutting down DynamicServingCertificateController"
Sep  5 04:50:39 juju-657122-30 microk8s.daemon-kubelite[1510288]: I0905 04:50:39.660512 1510288 dynamic_cafile_content.go:171] "Shutting down controller" name="client-ca-bundle::/var/snap/microk8s/7150/certs/ca.crt"
Sep  5 04:50:39 juju-657122-30 microk8s.daemon-kubelite[1510288]: I0905 04:50:39.662871 1510288 daemon.go:59] Stopping Kubelet
Sep  5 04:50:39 juju-657122-30 microk8s.daemon-kubelite[1510288]: Stopping kubelite
Sep  5 04:50:39 juju-657122-30 systemd[1]: snap.microk8s.daemon-kubelite.service: Deactivated successfully.
Sep  5 04:50:39 juju-657122-30 systemd[1]: Stopped Service for snap application microk8s.daemon-kubelite.
Sep  5 04:50:39 juju-657122-30 systemd[1]: snap.microk8s.daemon-kubelite.service: Consumed 5d 1h 44min 9.399s CPU time.
Sep  5 04:50:39 juju-657122-30 systemd[1]: Stopping Service for snap application microk8s.daemon-containerd...
Sep  5 04:50:39 juju-657122-30 microk8s.daemon-containerd[1509860]: time="2024-09-05T04:50:39.931345072Z" level=info msg="Stop CRI service"
Sep  5 04:50:39 juju-657122-30 microk8s.daemon-containerd[1509860]: time="2024-09-05T04:50:39.938819623Z" level=info msg="Stop CRI service"
Sep  5 04:50:39 juju-657122-30 microk8s.daemon-containerd[1509860]: time="2024-09-05T04:50:39.938921883Z" level=info msg="Event monitor stopped"
Sep  5 04:50:39 juju-657122-30 microk8s.daemon-containerd[1509860]: time="2024-09-05T04:50:39.938940979Z" level=info msg="Stream server stopped"
Sep  5 04:50:39 juju-657122-30 systemd[1]: snap.microk8s.daemon-containerd.service: Deactivated successfully.
Sep  5 04:50:39 juju-657122-30 systemd[1]: snap.microk8s.daemon-containerd.service: Unit process 64041 (containerd-shim) remains running after unit stopped.

Again there are 100+ repeats of the "process remains running after unit stopped" message.

Sep  5 04:50:39 juju-657122-30 systemd[1]: Stopped Service for snap application microk8s.daemon-containerd.
Sep  5 04:50:39 juju-657122-30 systemd[1]: snap.microk8s.daemon-containerd.service: Consumed 1d 7h 24min 4.243s CPU time.
Sep  5 04:50:40 juju-657122-30 snapd[499054]: services.go:1067: RemoveSnapServices - disabling snap.microk8s.daemon-containerd.service
Sep  5 04:50:40 juju-657122-30 snapd[499054]: services.go:1067: RemoveSnapServices - disabling snap.microk8s.daemon-k8s-dqlite.service
Sep  5 04:50:40 juju-657122-30 snapd[499054]: services.go:1067: RemoveSnapServices - disabling snap.microk8s.daemon-kubelite.service
Sep  5 04:50:40 juju-657122-30 snapd[499054]: services.go:1067: RemoveSnapServices - disabling snap.microk8s.daemon-apiserver-kicker.service
Sep  5 04:50:40 juju-657122-30 snapd[499054]: services.go:1067: RemoveSnapServices - disabling snap.microk8s.daemon-flanneld.service
Sep  5 04:50:40 juju-657122-30 snapd[499054]: services.go:1067: RemoveSnapServices - disabling snap.microk8s.daemon-apiserver-proxy.service
Sep  5 04:50:40 juju-657122-30 snapd[499054]: services.go:1067: RemoveSnapServices - disabling snap.microk8s.daemon-cluster-agent.service
Sep  5 04:50:40 juju-657122-30 snapd[499054]: services.go:1067: RemoveSnapServices - disabling snap.microk8s.daemon-etcd.service
...
Sep  5 04:50:55 juju-657122-30 systemd[1]: Started snap.microk8s.hook.post-refresh-98deb83e-f734-489b-a8db-079ab692ee5d.scope.
Sep  5 04:50:55 juju-657122-30 systemd[1]: snap.microk8s.hook.post-refresh-98deb83e-f734-489b-a8db-079ab692ee5d.scope: Deactivated successfully.
Sep  5 04:50:55 juju-657122-30 systemd[1]: Reloading.
Sep  5 04:50:57 juju-657122-30 systemd[1]: Configuration file /run/systemd/system/netplan-ovs-cleanup.service is marked world-inaccessible. This has no effect as configuration data is accessible via APIs without restrictions. Proceeding anyway.
Sep  5 04:50:59 juju-657122-30 systemd[1]: Started Service for snap application microk8s.daemon-apiserver-kicker.
Sep  5 04:50:59 juju-657122-30 systemd[1]: Started Service for snap application microk8s.daemon-apiserver-proxy.
Sep  5 04:50:59 juju-657122-30 systemd[1]: Started Service for snap application microk8s.daemon-k8s-dqlite.
Sep  5 04:50:59 juju-657122-30 systemd[1]: Started Service for snap application microk8s.daemon-etcd.
Sep  5 04:50:59 juju-657122-30 systemd[1]: snap.microk8s.daemon-containerd.service: Found left-over process 64041 (containerd-shim) in control group while starting unit. Ignoring.
Sep  5 04:50:59 juju-657122-30 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
...

And 100+ repeats of the "Found left-over process..."

Sep  5 04:50:59 juju-657122-30 systemd[1]: Starting Service for snap application microk8s.daemon-containerd...
...
Sep  5 04:51:01 juju-657122-30 microk8s.daemon-containerd[3352598]: time="2024-09-05T04:51:01.032704314Z" level=error msg="failed to resolve runtime path" err
or="invalid custom binary path: stat /snap/microk8s/6750/bin/containerd-shim-runc-v2: no such file or directory"

From comparing to the 3-node cluster microk8s deployment, I see very similar messages when it failed.

pengwyn commented 2 days ago

I realised I still have access to the logs when the 3-node cluster autorefreshed to version v1.28.13. It appears as though a very similar "failure" happened at that time too, although the cluster itself continued to work. By "failure" I mean that I see:

It is possible that this cluster was broken from this time and our monitoring didn't pick up on it. This seems very unlikely though, as the later autorefresh did cause some problems.