Open lotharbach opened 7 months ago
/reopen
Some additional information, as I also just encountered a node where this problem occurred. On that specific node, there were multiple plugins
defined for containerd
, and all of them set the property SystemdCgroup = true
.
This leads to an unexpected output, when the script tries to determine, if cgroupfs
or systemd
is configured for containerd
. In the end, it executes this
$ containerd config dump | grep SystemdCgroup
SystemdCgroup = true
SystemdCgroup = true
SystemdCgroup = true
SystemdCgroup = true
SystemdCgroup = true
Which is further processed and put into the variable systemd_cgroup_driver
. We can look at the content and see that there are multiple true
s seperated by a space(!).
$ systemd_cgroup_driver=$(containerd config dump | grep SystemdCgroup | awk -F '=' '{print $2}' | sed 's/^\W//g')
$ echo $systemd_cgroup_driver
true true true true true
When we try to process this with the if statement in the script, it doesn't work, as this expects a single true
instead of multiple ones.
$ get_kubelet_cgroup_driver
cgroupfs
How to categorize this issue?
/area os /kind bug /os garden-linux
What happened: We got into what looks like the same situation described in #98 after upgrading the extension to 0.23.0, but it's not a race condition for new nodes as was the conclusion there so I opted for a new issue.
For us existing nodes somehow got their cgroups driver changed and now pods can't spawn.
Warning FailedCreatePodSandBox 112s (x1613 over 41m) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: expected cgroupsPath to be of for mat "slice:prefix:name" for systemd cgroups, got "/kubepods/burstable/podc8032134-6f4d-4eab-a086-bafd11ef6dd2/4f4f22b56ac4bc4c998c28ba608f35f8ad733b08b7e1c52 5e8f70d2c928639ee" instead: unknown
Something about the "Don't touch existing nodes" check in #133 did not work. We rolled back to 0.22.0 which fixed the problem. I'll update this issue with further logs and information when I can reproduce it on a test seed.
What you expected to happen: Upgrade of the extension not to break existing nodes.
How to reproduce it (as minimally and precisely as possible): Upgrade os-gardenlinux from 0.22.0 to 0.23.0 (while updating gardener/gardenlet from 1.86.1 to 1.87.0 at the same time)
Anything else we need to know?:
Environment: