Open jobh opened 2 years ago
Hi @jobh, thank you for reaching out and for sharing your approach to this problem.
Let me first clarify how updates are done. Starting from v1.22 updates (snap refreshes) do not stop the workloads running on the upgrading nodes. This means that the postgres pods will continue working while the k8s services restart. What MicroK8s version are you using?
Something that is not clear to me is why you are saying that the pods cannot write to the hostpath provisioned while the services are stopping. I assume you are using the "storage" addon? The hostpaths are mounted inside the pods and k8s services are not involved during operation. Therefore I would expect the mounts to continue being valid while services update/restart/refresh.
Calling microk8s stop
is a command what is expected to stop the services in the node and also kill any workloads running [1]. In a multinode cluster or a long lasting cluster you are not expected to run microk8s stop
. Could you explain a bit more the workflow you follow that involves calling the stop command?
Although the approach you have taken is specific to your use case I can see that there may be the need to run custom scripts when shutting down the cluster. We could consider introducing hooks that users can use to inject custom behavior. For example in [1] we could do something like
If there is a user provided hook:
call the hook script
kill_all_container_shims
stop k8s services
[1] https://github.com/ubuntu/microk8s/blob/master/microk8s-resources/wrappers/microk8s-stop.wrapper#L47
That's good to hear, that refreshes do not stop workloads. I'm on the 1.21 channel, but it sounds like a worthwhile update, and would solve a lot. In my case, I'm under the additional constraint that my company IT resources are sometimes rebooted automatically to apply security patches. This doesn't run microk8s.stop
explicitly, but has the same effect by terminating systemd services. So a robust solution should be on the systemd level (sorry if this was unclear).
When I say "cannot write to hostpath", I'm referring to the results from the test shown in https://github.com/ubuntu/microk8s/issues/1022#issuecomment-1010031820. When redirecting its output to a file on the host, nothing was written to this file after microk8s stop
even though the process was alive for 30s according to ps
. I may have misinterpreted this result though.
@ktsakalozos,
I have been thinking about your hooks suggestion, and I think it is a good one. To combine with my idea of enforcing this on the systemd level, it could be done by an umbrella service like my example above, with the addition of a hook runner functionality in microk8s.
[Service]
...
ExecStart=/snap/bin/microk8s run-hooks post-start
ExecStop=/snap/bin/microk8s run-hooks pre-stop
(there's also the question of how to add and manage hooks of course).
But is it possible to define this in snapcraft.yml? It's not super clear how the snap daemons are mapped to systemd units.
Hey @ktsakalozos, your message kind of surprises me, because it doesn't match what I experience with MicroK8s (version 1.25 in any case). For example, your wrote:
Starting from v1.22 updates (snap refreshes) do not stop the workloads running on the upgrading nodes.
That's not what the logs for snapd
are reporting, but instead:
$ systemctl status snapd.service
● snapd.service - Snap Daemon
...
Nov 10 14:58:35 hetzner-green snapd[2538454]: snapstate.go:1591: cannot refresh snap "microk8s": snap "microk8s" has running apps (kubectl, microk8s), pids: 2013283,201332>
Nov 10 14:58:35 hetzner-green snapd[2538454]: autorefresh.go:540: auto-refresh: all snaps are up-to-date
Nov 10 23:48:34 hetzner-green snapd[2538454]: storehelpers.go:748: cannot refresh: snap has no updates available: "core18", "snapd"
Nov 10 23:48:34 hetzner-green snapd[2538454]: snapstate.go:1591: cannot refresh snap "microk8s": snap "microk8s" has running apps (kubectl, microk8s), pids: 2013283,201332>
Nov 10 23:48:34 hetzner-green snapd[2538454]: autorefresh.go:540: auto-refresh: all snaps are up-to-date
...
And indeed, although MicroK8s was configured to track 1.25/stable
, the version on my machine, installed a couple of months old, wasn't the latest revision of this channel.
You also wrote the following in your message:
Calling microk8s stop is a command what is expected to stop the services in the node and also kill any workloads running.
Interesting, because that's not the behavior that I see. On my machine, calling microk8s.stop
stops the MicroK8s components indeed but all the various pods/containers of workloads keep running.
In practice, which line in microk8s-stop.wrapper
should be responsible for killing those workloads?
I see this kill_all_container_shims
function (declared at https://github.com/canonical/microk8s/blob/master/microk8s-resources/actions/common/utils.sh#L692), but it seems to be only concerned with Kubernetes services, isn't it?
Thanks!
hey fellas, i was wondering about similar (stop) problem. i'm on MicroK8s v1.25.4 revision 4221 and was also wondering, that my workloads are not stopped when stopping microk8s. the core components of kubernetes seem to shutdown properly but for example my neo4j database, which i installed using helm template, consisting of a stateful set, some services and secrets still keeps running. this is basically the only custom workload i have deployed.
these are my enabled addons:
datastore master nodes: 127.0.0.1:19001
datastore standby nodes: none
addons:
enabled:
dns # (core) CoreDNS
ha-cluster # (core) Configure high availability on the current node
helm # (core) Helm - the package manager for Kubernetes
helm3 # (core) Helm 3 - the package manager for Kubernetes
hostpath-storage # (core) Storage class; allocates storage from host directory
metallb # (core) Loadbalancer for your Kubernetes cluster
rbac # (core) Role-Based Access Control for authorisation
registry # (core) Private image registry exposed on localhost:32000
storage # (core) Alias to hostpath-storage add-on, deprecated
disabled:
cert-manager # (core) Cloud native certificate management
community # (core) The community addons repository
dashboard # (core) The Kubernetes dashboard
gpu # (core) Automatic enablement of Nvidia CUDA
host-access # (core) Allow Pods connecting to Host services smoothly
ingress # (core) Ingress controller for external access
kube-ovn # (core) An advanced network fabric for Kubernetes
mayastor # (core) OpenEBS MayaStor
metrics-server # (core) K8s Metrics Server for API access to service metrics
observability # (core) A lightweight observability stack for logs, traces and metrics
prometheus # (core) Prometheus operator for monitoring and logging
i noticed it, when i enabled the openebs mayastor addon (which consumes 100% of one cpu by design). after shutting down with microk8s stop the mayastor process was still running and still consuming 100% of one cpu.
i hope it is not too offtopic, regarding the original message, but i'm also interested in this issue :-D
kind regards
This is also relevant when you have many microk8s nodes in some kind of autoscaling setup where you want them to be constantly stopped/started.
Also facing this problem, it seems that containerd does not shutdown all it's containers when exiting (although the code shows sending a SIGKILL signal). I have found the only way to get rid of all the pod processes is to
#!/usr/bin/env bash
# get all kubepod pids
readarray -t KUBEPOD_PIDS <<< "$(/usr/bin/find /sys/fs/cgroup/pids/kubepods -name tasks -exec /usr/bin/cat {} \;)"
if (( "${#KUBEPOD_PIDS[@]}" > 1 )); then
# send SIGTERM to gracefully end pod processes
/usr/bin/kill "${KUBEPOD_PIDS[@]}"
# wait 10s for graceful stop
/usr/bin/sleep 10s
# again get kubepod pids
readarray -t KUBEPOD_PIDS <<< "$(/usr/bin/find /sys/fs/cgroup/pids/kubepods -name tasks -exec /usr/bin/cat {} \;)"
if (( "${#KUBEPOD_PIDS[@]}" > 1 )); then
# send SIGKILL to remove remaining processes
/usr/bin/kill -9 "${KUBEPOD_PIDS[@]}"
fi
fi
There is a long discussion about snap auto-updates in issue #1022. I think the situation would be much improved if stateful pods were allowed to exit gracefully upon microk8s.stop. That would take away the data corruption problems, as per my comment there: https://github.com/ubuntu/microk8s/issues/1022#issuecomment-1010031820
To summarize: When microk8s.stop is issued, the k8s infrastructure is torn down before pods have had a chance to react to SIGTERM. Hence, even though they have 30 seconds to terminate gracefully, they cannot write to host-provisioned paths during this time, nor can they communicate over the network.
My own (unfinished) workaround at this time is to create a new systemd service which
Requires
all microk8s services, and itself waits for the graceful shutdown of postgresql. For the first time, I've seen these highly desired lines from postgresql,I've attached my systemd hacks below, but the reason for opening this issue is to discuss whether this could be generalized and maybe even made default. Perhaps by replacing the postgresql-specific scaling with node drain/uncordon, or by shutting down the infrastructure in a "safe" order.
/etc/systemd/system/microk8s-sentry.service
Note! This is just a proof-of-concept, tested only briefly. For discussion.
/etc/systemd/system/snap.microk8s.daemon-kubelet.service.d/microk8s-sentry.conf
Add to upstream's
[Unit]
section to ensure the above service is started automatically along with microk8s. Again, just proof-of-concept.