During the Sunbeam deployment described in:
https://microstack.run/docs/multi-node-maas
Microk8s restarted and it caused Juju charm hook failures since K8s API endpoint wasn't available.
$ snap list microk8s
Name Version Rev Tracking Publisher Notes
microk8s v1.28.7 6532 1.28-strict/stable canonical✓ -
May 29 06:08:00 machine-1 microk8s.daemon-kubelite[13121]: E0529 06:08:00.604719 13121 leaderelection.go:369] Failed to update lock: Put "https://127.0.0.1:16443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-controller-manager?timeout=15s": context deadline exceeded
May 29 06:08:00 machine-1 microk8s.daemon-kubelite[13121]: I0529 06:08:00.604859 13121 leaderelection.go:285] failed to renew lease kube-system/kube-controller-manager: timed out waiting for the condition
May 29 06:08:00 machine-1 microk8s.daemon-kubelite[13121]: E0529 06:08:00.604948 13121 controllermanager.go:302] "leaderelection lost"
May 29 06:08:01 machine-1 systemd[1]: snap.microk8s.daemon-kubelite.service: Main process exited, code=exited, status=1/FAILURE
May 29 06:08:01 machine-1 systemd[1]: snap.microk8s.daemon-kubelite.service: Failed with result 'exit-code'.
May 29 06:08:01 machine-1 systemd[1]: snap.microk8s.daemon-kubelite.service: Consumed 13min 7.868s CPU time.
May 29 06:08:01 machine-1 systemd[1]: snap.microk8s.daemon-kubelite.service: Scheduled restart job, restart counter is at 1.
May 29 06:08:01 machine-1 systemd[1]: Stopped Service for snap application microk8s.daemon-kubelite.
May 29 06:08:01 machine-1 systemd[1]: snap.microk8s.daemon-kubelite.service: Consumed 13min 7.868s CPU time.
May 29 06:08:01 machine-1 systemd[1]: Started Service for snap application microk8s.daemon-kubelite.
What Should Happen Instead?
microk8s shouldn't be restarted once the initial k8s cluster deployment is done.
Fwiw, 30 charms are deployed on top of one microk8s node. And CPU was heavily used by charm hook executions at the same time with the host processes such as microk8s including dqlite.
06:08:01 is 15:08:01 in this graph where CPU started saturated.
Ah there is no option to inject a custom value for --leader-elect-lease-duration using the microk8s charm to see if it mitigates the issue.
https://microk8s.io/docs/ref-launch-config
Summary
During the Sunbeam deployment described in: https://microstack.run/docs/multi-node-maas Microk8s restarted and it caused Juju charm hook failures since K8s API endpoint wasn't available.
More context is available at: https://bugs.launchpad.net/snap-openstack/+bug/2067451
What Should Happen Instead?
microk8s shouldn't be restarted once the initial k8s cluster deployment is done.
Reproduction Steps
Fwiw, 30 charms are deployed on top of one microk8s node. And CPU was heavily used by charm hook executions at the same time with the host processes such as microk8s including dqlite.
06:08:01
is15:08:01
in this graph where CPU started saturated.Introspection Report
inspection-report-20240529_091424.tar.gz
sunbeam-inspection-report-20240529_071507.tar.gz