gardener / gardener-extension-os-gardenlinux

Gardener extension controller for the Garden Linux operating system
Apache License 2.0
9 stars 31 forks source link

cgroup driver configured inconsistently for containerd and kubelet #98

Closed MrBatschner closed 10 months ago

MrBatschner commented 1 year ago

How to categorize this issue?

/area os /kind bug /os garden-linux

What happened:

The scripts introduced in PR #82 and PR #85 are supposed to configure containerd's cgroup driver according to the cgroup version of the underlying host and to make sure that the kubelet is configured with the corresponding cgroup driver. Both reconfigurations happen by running a script in the respectives component's ExecStartPre= before it comes up.

Since these changes were released to Gardener landscapes, they proved to be reliable for most of the time. However, we came across at least one instance of a new node with Garden Linux 934.7 and hence with cgroup v2 being scheduled for an existing worker pool which failed to join the cluster due to an inconsistent cgroup driver configuration: containerd ran with the systemd cgroup driver but the kubelet still ran with the cgroupfs cgroup driver.

From the logs of the internal support ticket:

2023-05-10 06:50:55 | {"_entry":"Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: expected cgroupsPath to be of format \"slice:prefix:name\" for systemd cgroups, got \"/kubepods/burstable/pod8d7a925b-ba2f-4fcd-9eb7-6a933858f3df/d6f8417b97f66a8e2fd5b7c23cf01222a1b7b476c29c12b1d557c219061e96db\" instead: unknown","count":1,"firstTimestamp":"2023-05-10T06:50:55Z","lastTimestamp":"2023-05-10T06:50:55Z","namespace":"kube-system","object":"Pod/egress-filter-applier-sx4d4","origin":"shoot","reason":"FailedCreatePodSandBox","source":"kubelet","sourceHost":"shoot--hyperspace--prod-int-worker-gqla1-z1-67c55-ntm74","type":"Warning"}
2023-05-10 06:50:55 | {"_entry":"Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: expected cgroupsPath to be of format \"slice:prefix:name\" for systemd cgroups, got \"/kubepods/burstable/pod2eed3a6f-3e07-42f9-9219-ba72f7f66c47/d85c081a0e7821d8e5a853ee48e97bc0bf2cebb8050b4ff9ac0a3ca9e10ae50b\" instead: unknown","count":1,"firstTimestamp":"2023-05-10T06:50:55Z","lastTimestamp":"2023-05-10T06:50:55Z","namespace":"kube-system","object":"Pod/apiserver-proxy-6fdfb","origin":"shoot","reason":"FailedCreatePodSandBox","source":"kubelet","sourceHost":"shoot--hyperspace--prod-int-worker-gqla1-z1-67c55-ntm74","type":"Warning"}Show context
2023-05-10 06:50:55 | {"_entry":"Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: expected cgroupsPath to be of format \"slice:prefix:name\" for systemd cgroups, got \"/kubepods/burstable/pod0afb0879-f205-4b66-bc0b-28ff62c898b2/167ccf567fa2cd2a438b18b63b158c0bb21c859559e93924f2f2d1234c35d17b\" instead: unknown","count":1,"firstTimestamp":"2023-05-10T06:50:55Z","lastTimestamp":"2023-05-10T06:50:55Z","namespace":"kube-system","object":"Pod/node-local-dns-hhns4","origin":"shoot","reason":"FailedCreatePodSandBox","source":"kubelet","sourceHost":"shoot--hyperspace--prod-int-worker-gqla1-z1-67c55-ntm74","type":"Warning"}
2023-05-10 06:50:55 | {"_entry":"Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: expected cgroupsPath to be of format \"slice:prefix:name\" for systemd cgroups, got \"/kubepods/burstable/poda592f80e-1070-482b-94db-1b070e836f7a/930ac369e7035000c8810e6c72b720ff73485834269738d930d18ef96b489874\" instead: unknown","count":1,"firstTimestamp":"2023-05-10T06:50:55Z","lastTimestamp":"2023-05-10T06:50:55Z","namespace":"kube-system","object":"Pod/csi-driver-node-gm8p4","origin":"shoot","reason":"FailedCreatePodSandBox","source":"kubelet","sourceHost":"shoot--hyperspace--prod-int-worker-gqla1-z1-67c55-ntm74","type":"Warning"}

These log message are a strong pointer to the kubelet creating a classic cgroupfs hierarchy which containerd then has not clue about what to do with.

What you expected to happen:

Containerd and kubelet are always running with the same cgroup driver and log message like the ones given above do not occur.

How to reproduce it (as minimally and precisely as possible):

Steps to reproduce need to be investigated. It seems that this is caused by a timing issue or a race condition.

MrBatschner commented 1 year ago

Problem was investigated and reproduced. It leads down to a probable race. The following things can be found in the logs.

Logs

containerd

# journalctl -u containerd.service

Aug 10 12:47:54 shoot--project--cluster systemd[1]: Starting containerd container runtime...
Aug 10 12:47:54 shoot--project--cluster containerd[632]: time="2023-08-10T12:47:54.256491336Z" level=info msg="starting containerd" revision="1.6.20~ds1-1gardenlinux3" version="1.6.20~ds1"
...
Aug 10 12:47:54 shoot--project--cluster systemd[1]: Started containerd container runtime.
Aug 10 12:47:54 shoot--project--cluster containerd[632]: time="2023-08-10T12:47:54.338506007Z" level=info msg="containerd successfully booted in 0.089254s"
...
Aug 10 12:48:09 shoot--project--cluster containerd[632]: time="2023-08-10T12:48:09.099968316Z" level=warning msg="cleanup warnings time=\"2023-08-10T12:48:09Z\" level=info msg=\"starting signal loop\" namespace=moby pid=1025 runtime=io.containerd.runc.v2\n"
Aug 10 12:48:19 shoot--project--cluster systemd[1]: Stopping containerd container runtime...
Aug 10 12:48:19 shoot--project--cluster containerd[632]: time="2023-08-10T12:48:19.645043636Z" level=info msg="Stop CRI service"
Aug 10 12:48:19 shoot--project--cluster systemd[1]: containerd.service: Deactivated successfully.
Aug 10 12:48:19 shoot--project--cluster systemd[1]: Stopped containerd container runtime.
Aug 10 12:48:20 shoot--project--cluster systemd[1]: Starting containerd container runtime...
Aug 10 12:48:20 shoot--project--cluster containerd_cgroup_driver.sh[3041]: /run/containerd/io.containerd.runtime.v2.task/k8s.io does not exists - no tasks in k8s.io namespace
Aug 10 12:48:20 shoot--project--cluster containerd_cgroup_driver.sh[3041]: Configuring containerd cgroup driver to systemd
Aug 10 12:48:20 shoot--project--cluster containerd_cgroup_driver.sh[3063]: activating
Aug 10 12:48:20 shoot--project--cluster containerd[3064]: time="2023-08-10T12:48:20.488933711Z" level=info msg="starting containerd" revision="1.6.20~ds1-1gardenlinux3" version="1.6.20~ds1"
...
Aug 10 12:48:20 shoot--project--cluster systemd[1]: Started containerd container runtime.
Aug 10 12:48:20 shoot--project--cluster containerd[3064]: time="2023-08-10T12:48:20.565626487Z" level=info msg="containerd successfully booted in 0.082965s"
...

kubelet

# journalctl -u kubelet.service

Aug 10 12:48:19 shoot--project--cluster systemd[1]: Starting kubelet daemon...
Aug 10 12:48:19 shoot--project--cluster copy-kubernetes-binary.sh[2949]: Checking whether to copy new kubelet binary from hyperkube image to /opt/bin folder...
Aug 10 12:48:19 shoot--project--cluster copy-kubernetes-binary.sh[2949]: kubelet binary in /opt/bin is outdated (image used for last copy: ). Need to update it to eu.gcr.io/sap-se-gcr-k8s-public/eu_gcr_io/gardener-project/hyperkube@sha256:24a65f4e5f4a508f8c164cec9bd3dbde5fd263dd5816ed819f0d8d5210ffe8d5.
Aug 10 12:48:20 shoot--project--cluster kubelet_cgroup_driver.sh[3038]: kubelet and containerd are configured with the same cgroup driver (cgroupfs) - nothing to do
Aug 10 12:48:20 shoot--project--cluster systemd[1]: Started kubelet daemon.
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: Flag --container-runtime has been deprecated, will be removed in 1.27 as the only valid value is 'remote'
...
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.587896    3075 flags.go:64] FLAG: --address="0.0.0.0"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.587916    3075 flags.go:64] FLAG: --allowed-unsafe-sysctls="[]"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.587938    3075 flags.go:64] FLAG: --anonymous-auth="true"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.587947    3075 flags.go:64] FLAG: --application-metrics-count-limit="100"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.587956    3075 flags.go:64] FLAG: --authentication-token-webhook="false"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.587964    3075 flags.go:64] FLAG: --authentication-token-webhook-cache-ttl="2m0s"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.587977    3075 flags.go:64] FLAG: --authorization-mode="AlwaysAllow"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.587986    3075 flags.go:64] FLAG: --authorization-webhook-cache-authorized-ttl="5m0s"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.587993    3075 flags.go:64] FLAG: --authorization-webhook-cache-unauthorized-ttl="30s"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.587999    3075 flags.go:64] FLAG: --azure-container-registry-config=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588005    3075 flags.go:64] FLAG: --boot-id-file="/proc/sys/kernel/random/boot_id"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588013    3075 flags.go:64] FLAG: --bootstrap-kubeconfig="/var/lib/kubelet/kubeconfig-bootstrap"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588020    3075 flags.go:64] FLAG: --cert-dir="/var/lib/kubelet/pki"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588034    3075 flags.go:64] FLAG: --cgroup-driver="cgroupfs"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588043    3075 flags.go:64] FLAG: --cgroup-root=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588049    3075 flags.go:64] FLAG: --cgroups-per-qos="true"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588056    3075 flags.go:64] FLAG: --client-ca-file=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588062    3075 flags.go:64] FLAG: --cloud-config=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588068    3075 flags.go:64] FLAG: --cloud-provider="external"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588092    3075 flags.go:64] FLAG: --cluster-dns="[]"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588114    3075 flags.go:64] FLAG: --cluster-domain=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588124    3075 flags.go:64] FLAG: --config="/var/lib/kubelet/config/kubelet"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588131    3075 flags.go:64] FLAG: --container-hints="/etc/cadvisor/container_hints.json"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588139    3075 flags.go:64] FLAG: --container-log-max-files="5"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588161    3075 flags.go:64] FLAG: --container-log-max-size="10Mi"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588167    3075 flags.go:64] FLAG: --container-runtime="remote"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588174    3075 flags.go:64] FLAG: --container-runtime-endpoint="unix:///run/containerd/containerd.sock"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588181    3075 flags.go:64] FLAG: --containerd="/run/containerd/containerd.sock"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588193    3075 flags.go:64] FLAG: --containerd-namespace="k8s.io"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588202    3075 flags.go:64] FLAG: --contention-profiling="false"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588209    3075 flags.go:64] FLAG: --cpu-cfs-quota="true"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588216    3075 flags.go:64] FLAG: --cpu-cfs-quota-period="100ms"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588223    3075 flags.go:64] FLAG: --cpu-manager-policy="none"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588229    3075 flags.go:64] FLAG: --cpu-manager-policy-options=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588238    3075 flags.go:64] FLAG: --cpu-manager-reconcile-period="10s"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588249    3075 flags.go:64] FLAG: --enable-controller-attach-detach="true"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588255    3075 flags.go:64] FLAG: --enable-debugging-handlers="true"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588262    3075 flags.go:64] FLAG: --enable-load-reader="false"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588269    3075 flags.go:64] FLAG: --enable-server="true"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588275    3075 flags.go:64] FLAG: --enforce-node-allocatable="[pods]"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588294    3075 flags.go:64] FLAG: --event-burst="10"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588300    3075 flags.go:64] FLAG: --event-qps="5"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588311    3075 flags.go:64] FLAG: --event-storage-age-limit="default=0"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588318    3075 flags.go:64] FLAG: --event-storage-event-limit="default=0"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588325    3075 flags.go:64] FLAG: --eviction-hard=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588336    3075 flags.go:64] FLAG: --eviction-max-pod-grace-period="0"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588342    3075 flags.go:64] FLAG: --eviction-minimum-reclaim=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588348    3075 flags.go:64] FLAG: --eviction-pressure-transition-period="5m0s"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588355    3075 flags.go:64] FLAG: --eviction-soft=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588372    3075 flags.go:64] FLAG: --eviction-soft-grace-period=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588379    3075 flags.go:64] FLAG: --exit-on-lock-contention="false"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588395    3075 flags.go:64] FLAG: --experimental-allocatable-ignore-eviction="false"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588403    3075 flags.go:64] FLAG: --experimental-mounter-path=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588409    3075 flags.go:64] FLAG: --fail-swap-on="true"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588416    3075 flags.go:64] FLAG: --feature-gates=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588424    3075 flags.go:64] FLAG: --file-check-frequency="20s"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588434    3075 flags.go:64] FLAG: --global-housekeeping-interval="1m0s"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588440    3075 flags.go:64] FLAG: --hairpin-mode="promiscuous-bridge"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588448    3075 flags.go:64] FLAG: --healthz-bind-address="127.0.0.1"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588455    3075 flags.go:64] FLAG: --healthz-port="10248"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588462    3075 flags.go:64] FLAG: --help="false"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588487    3075 flags.go:64] FLAG: --hostname-override=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588496    3075 flags.go:64] FLAG: --housekeeping-interval="10s"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588504    3075 flags.go:64] FLAG: --http-check-frequency="20s"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588516    3075 flags.go:64] FLAG: --image-credential-provider-bin-dir=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588522    3075 flags.go:64] FLAG: --image-credential-provider-config=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588528    3075 flags.go:64] FLAG: --image-gc-high-threshold="85"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588535    3075 flags.go:64] FLAG: --image-gc-low-threshold="80"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588541    3075 flags.go:64] FLAG: --image-service-endpoint=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588557    3075 flags.go:64] FLAG: --iptables-drop-bit="15"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588565    3075 flags.go:64] FLAG: --iptables-masquerade-bit="14"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588575    3075 flags.go:64] FLAG: --keep-terminated-pod-volumes="false"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588582    3075 flags.go:64] FLAG: --kernel-memcg-notification="false"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588588    3075 flags.go:64] FLAG: --kube-api-burst="10"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588595    3075 flags.go:64] FLAG: --kube-api-content-type="application/vnd.kubernetes.protobuf"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588601    3075 flags.go:64] FLAG: --kube-api-qps="5"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588609    3075 flags.go:64] FLAG: --kube-reserved=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588618    3075 flags.go:64] FLAG: --kube-reserved-cgroup=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588628    3075 flags.go:64] FLAG: --kubeconfig="/var/lib/kubelet/kubeconfig-real"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588635    3075 flags.go:64] FLAG: --kubelet-cgroups=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588641    3075 flags.go:64] FLAG: --local-storage-capacity-isolation="true"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588648    3075 flags.go:64] FLAG: --lock-file=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588654    3075 flags.go:64] FLAG: --log-cadvisor-usage="false"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588660    3075 flags.go:64] FLAG: --log-flush-frequency="5s"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588667    3075 flags.go:64] FLAG: --log-json-info-buffer-size="0"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588681    3075 flags.go:64] FLAG: --log-json-split-stream="false"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588687    3075 flags.go:64] FLAG: --logging-format="text"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588704    3075 flags.go:64] FLAG: --machine-id-file="/etc/machine-id,/var/lib/dbus/machine-id"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588713    3075 flags.go:64] FLAG: --make-iptables-util-chains="true"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588720    3075 flags.go:64] FLAG: --manifest-url=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588728    3075 flags.go:64] FLAG: --manifest-url-header=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588737    3075 flags.go:64] FLAG: --master-service-namespace="default"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588747    3075 flags.go:64] FLAG: --max-open-files="1000000"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588757    3075 flags.go:64] FLAG: --max-pods="110"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588764    3075 flags.go:64] FLAG: --maximum-dead-containers="-1"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588770    3075 flags.go:64] FLAG: --maximum-dead-containers-per-container="1"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588776    3075 flags.go:64] FLAG: --memory-manager-policy="None"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588782    3075 flags.go:64] FLAG: --minimum-container-ttl-duration="0s"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588789    3075 flags.go:64] FLAG: --minimum-image-ttl-duration="2m0s"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588795    3075 flags.go:64] FLAG: --node-ip=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588805    3075 flags.go:64] FLAG: --node-labels="kubernetes.io/arch=amd64,networking.gardener.cloud/node-local-dns-enabled=true,node.kubernetes.io/role=node,worker.garden.sapcloud.io/group=worker-1,worker.gardener.cloud/cri-name=containerd,worker.gardener.cloud/image-name=gardenlinux,worker.gardener.cloud/image-version=934.9.0,worker.gardener.cloud/kubernetes-version=1.26.7,worker.gardener.cloud/pool=worker-1,worker.gardener.cloud/system-components=true"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588869    3075 flags.go:64] FLAG: --node-status-max-images="50"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588876    3075 flags.go:64] FLAG: --node-status-update-frequency="10s"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588883    3075 flags.go:64] FLAG: --oom-score-adj="-999"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588889    3075 flags.go:64] FLAG: --pod-cidr=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588906    3075 flags.go:64] FLAG: --pod-infra-container-image="registry.k8s.io/pause:3.9"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588917    3075 flags.go:64] FLAG: --pod-manifest-path=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588928    3075 flags.go:64] FLAG: --pod-max-pids="-1"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588934    3075 flags.go:64] FLAG: --pods-per-core="0"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588941    3075 flags.go:64] FLAG: --port="10250"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588950    3075 flags.go:64] FLAG: --protect-kernel-defaults="false"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588957    3075 flags.go:64] FLAG: --provider-id=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588963    3075 flags.go:64] FLAG: --qos-reserved=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588970    3075 flags.go:64] FLAG: --read-only-port="10255"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588979    3075 flags.go:64] FLAG: --register-node="true"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588986    3075 flags.go:64] FLAG: --register-schedulable="true"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588992    3075 flags.go:64] FLAG: --register-with-taints=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589000    3075 flags.go:64] FLAG: --registry-burst="10"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589006    3075 flags.go:64] FLAG: --registry-qps="5"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589012    3075 flags.go:64] FLAG: --reserved-cpus=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589018    3075 flags.go:64] FLAG: --reserved-memory=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589030    3075 flags.go:64] FLAG: --resolv-conf="/etc/resolv.conf"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589039    3075 flags.go:64] FLAG: --root-dir="/var/lib/kubelet"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589056    3075 flags.go:64] FLAG: --rotate-certificates="false"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589064    3075 flags.go:64] FLAG: --rotate-server-certificates="false"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589071    3075 flags.go:64] FLAG: --runonce="false"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589080    3075 flags.go:64] FLAG: --runtime-cgroups="/system.slice/containerd.service"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589087    3075 flags.go:64] FLAG: --runtime-request-timeout="2m0s"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589097    3075 flags.go:64] FLAG: --seccomp-default="false"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589104    3075 flags.go:64] FLAG: --serialize-image-pulls="true"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589115    3075 flags.go:64] FLAG: --storage-driver-buffer-duration="1m0s"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589122    3075 flags.go:64] FLAG: --storage-driver-db="cadvisor"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589129    3075 flags.go:64] FLAG: --storage-driver-host="localhost:8086"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589135    3075 flags.go:64] FLAG: --storage-driver-password="root"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589141    3075 flags.go:64] FLAG: --storage-driver-secure="false"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589148    3075 flags.go:64] FLAG: --storage-driver-table="stats"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589159    3075 flags.go:64] FLAG: --storage-driver-user="root"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589165    3075 flags.go:64] FLAG: --streaming-connection-idle-timeout="4h0m0s"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589174    3075 flags.go:64] FLAG: --sync-frequency="1m0s"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589180    3075 flags.go:64] FLAG: --system-cgroups=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589186    3075 flags.go:64] FLAG: --system-reserved=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589203    3075 flags.go:64] FLAG: --system-reserved-cgroup=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589213    3075 flags.go:64] FLAG: --tls-cert-file=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589223    3075 flags.go:64] FLAG: --tls-cipher-suites="[]"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589246    3075 flags.go:64] FLAG: --tls-min-version=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589252    3075 flags.go:64] FLAG: --tls-private-key-file=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589258    3075 flags.go:64] FLAG: --topology-manager-policy="none"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589265    3075 flags.go:64] FLAG: --topology-manager-policy-options=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589271    3075 flags.go:64] FLAG: --topology-manager-scope="container"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589278    3075 flags.go:64] FLAG: --v="2"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589291    3075 flags.go:64] FLAG: --version="false"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589301    3075 flags.go:64] FLAG: --vmodule=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589309    3075 flags.go:64] FLAG: --volume-plugin-dir="/usr/libexec/kubernetes/kubelet-plugins/volume/exec/"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589317    3075 flags.go:64] FLAG: --volume-stats-agg-period="1m0s"
...

<< no further occurence and/or execution of kubelet_cgroup_driver.sh can be found in the logs >>

...
Aug 10 12:48:24 shoot--project--cluster kubelet[3075]: E0810 12:48:24.012528    3075 pod_workers.go:965] "Error syncing pod, skipping" err="network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized" pod="kube-system/network-problem-detector-pod-hbxzb" podUID=19e95587-53db-436c-836a-846e4aef172c
Aug 10 12:48:24 shoot--project--cluster kubelet[3075]: E0810 12:48:24.578677    3075 remote_runtime.go:176] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: expected cgroupsPath to be of format \"slice:prefix:name\" for systemd cgroups, got \"/kubepods/burstable/pode92075c7-ad42-431c-9d4f-ead1a33d586a/e2d492398555eebb3e966f027acc5b5f28659192be361831c0ee20b1fbed6fe3\" instead: unknown"
Aug 10 12:48:24 shoot--project--cluster kubelet[3075]: E0810 12:48:24.578769    3075 kuberuntime_sandbox.go:72] "Failed to create sandbox for pod" err="rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: expected cgroupsPath to be of format \"slice:prefix:name\" for systemd cgroups, got \"/kubepods/burstable/pode92075c7-ad42-431c-9d4f-ead1a33d586a/e2d492398555eebb3e966f027acc5b5f28659192be361831c0ee20b1fbed6fe3\" instead: unknown" pod="kube-system/calico-node-mnhkd"
Aug 10 12:48:24 shoot--project--cluster kubelet[3075]: E0810 12:48:24.578813    3075 kuberuntime_manager.go:782] "CreatePodSandbox for pod failed" err="rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: expected cgroupsPath to be of format \"slice:prefix:name\" for systemd cgroups, got \"/kubepods/burstable/pode92075c7-ad42-431c-9d4f-ead1a33d586a/e2d492398555eebb3e966f027acc5b5f28659192be361831c0ee20b1fbed6fe3\" instead: unknown" pod="kube-system/calico-node-mnhkd"
Aug 10 12:48:24 shoot--project--cluster kubelet[3075]: E0810 12:48:24.578895    3075 pod_workers.go:965] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"calico-node-mnhkd_kube-system(e92075c7-ad42-431c-9d4f-ead1a33d586a)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"calico-node-mnhkd_kube-system(e92075c7-ad42-431c-9d4f-ead1a33d586a)\\\": rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: expected cgroupsPath to be of format \\\"slice:prefix:name\\\" for systemd cgroups, got \\\"/kubepods/burstable/pode92075c7-ad42-431c-9d4f-ead1a33d586a/e2d492398555eebb3e966f027acc5b5f28659192be361831c0ee20b1fbed6fe3\\\" instead: unknown\"" pod="kube-system/calico-node-mnhkd" podUID=e92075c7-ad42-431c-9d4f-ead1a33d586a
Aug 10 12:48:24 shoot--project--cluster kubelet[3075]: E0810 12:48:24.580107    3075 remote_runtime.go:176] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: expected cgroupsPath to be of format \"slice:prefix:name\" for systemd cgroups, got \"/kubepods/burstable/pod965ebc20-3c9b-4956-9eea-0515f794f3fb/cfdc0497e8d169ff3a7d6566e6aa1c9b1287c091bf2f427ed3a56a729489bebd\" instead: unknown"
Aug 10 12:48:24 shoot--project--cluster kubelet[3075]: E0810 12:48:24.580189    3075 kuberuntime_sandbox.go:72] "Failed to create sandbox for pod" err="rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: expected cgroupsPath to be of format \"slice:prefix:name\" for systemd cgroups, got \"/kubepods/burstable/pod965ebc20-3c9b-4956-9eea-0515f794f3fb/cfdc0497e8d169ff3a7d6566e6aa1c9b1287c091bf2f427ed3a56a729489bebd\" instead: unknown" pod="kube-system/egress-filter-applier-jd57z"
Aug 10 12:48:24 shoot--project--cluster kubelet[3075]: E0810 12:48:24.580228    3075 kuberuntime_manager.go:782] "CreatePodSandbox for pod failed" err="rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: expected cgroupsPath to be of format \"slice:prefix:name\" for systemd cgroups, got \"/kubepods/burstable/pod965ebc20-3c9b-4956-9eea-0515f794f3fb/cfdc0497e8d169ff3a7d6566e6aa1c9b1287c091bf2f427ed3a56a729489bebd\" instead: unknown" pod="kube-system/egress-filter-applier-jd57z"
Aug 10 12:48:24 shoot--project--cluster kubelet[3075]: E0810 12:48:24.580301    3075 pod_workers.go:965] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"egress-filter-applier-jd57z_kube-system(965ebc20-3c9b-4956-9eea-0515f794f3fb)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"egress-filter-applier-jd57z_kube-system(965ebc20-3c9b-4956-9eea-0515f794f3fb)\\\": rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: expected cgroupsPath to be of format \\\"slice:prefix:name\\\" for systemd cgroups, got \\\"/kubepods/burstable/pod965ebc20-3c9b-4956-9eea-0515f794f3fb/cfdc0497e8d169ff3a7d6566e6aa1c9b1287c091bf2f427ed3a56a729489bebd\\\" instead: unknown\"" pod="kube-system/egress-filter-applier-jd57z" podUID=965ebc20-3c9b-4956-9eea-0515f794f3fb
...

Systemd unit files

containerd

$ systemctl cat containerd.service

# /lib/systemd/system/containerd.service
# Copyright The containerd Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

[Unit]
Description=containerd container runtime
Documentation=https://containerd.io
After=network.target local-fs.target

[Service]
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/bin/containerd

Type=notify
Delegate=yes
KillMode=process
Restart=always
RestartSec=5
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNPROC=infinity
LimitCORE=infinity
LimitNOFILE=infinity
# Comment TasksMax if your systemd version does not supports it.
# Only systemd 226 and above support this version.
TasksMax=infinity
OOMScoreAdjust=-999

[Install]
WantedBy=multi-user.target

# /etc/systemd/system/containerd.service.d/10-configure-cgroup-driver.conf
[Service]
ExecStartPre=/opt/gardener/bin/containerd_cgroup_driver.sh

# /etc/systemd/system/containerd.service.d/10-require-containerd-initializer.conf
[Unit]
After=containerd-initializer.service
Requires=containerd-initializer.service

# /etc/systemd/system/containerd.service.d/11-exec_config.conf
[Service]
ExecStart=
ExecStart=/usr/bin/containerd --config=/etc/containerd/config.toml

# /etc/systemd/system/containerd.service.d/30-env_config.conf
[Service]
Environment="PATH=/var/bin/containerruntimes:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"

# /etc/systemd/system/containerd.service.d/override.conf
[Service]
LimitMEMLOCK=67108864

kubelet

$ systemctl cat kubelet.service

# /etc/systemd/system/kubelet.service
[Unit]
Description=kubelet daemon
Documentation=https://kubernetes.io/docs/admin/kubelet
After=containerd.service

[Install]
WantedBy=multi-user.target

[Service]
Restart=always
RestartSec=5
EnvironmentFile=/etc/environment
EnvironmentFile=-/var/lib/kubelet/extra_args
ExecStartPre=/var/lib/kubelet/copy-kubernetes-binary.sh kubelet
ExecStart=/opt/bin/kubelet --bootstrap-kubeconfig=/var/lib/kubelet/kubeconfig-bootstrap \
    --config=/var/lib/kubelet/config/kubelet \
    --kubeconfig=/var/lib/kubelet/kubeconfig-real \
    --node-labels=worker.gardener.cloud/kubernetes-version=1.26.7 \
    --node-labels=kubernetes.io/arch=amd64 \
    --node-labels=networking.gardener.cloud/node-local-dns-enabled=true \
    --node-labels=node.kubernetes.io/role=node \
    --node-labels=worker.garden.sapcloud.io/group=worker-1 \
    --node-labels=worker.gardener.cloud/cri-name=containerd \
    --node-labels=worker.gardener.cloud/image-name=gardenlinux \
    --node-labels=worker.gardener.cloud/image-version=934.9.0 \
    --node-labels=worker.gardener.cloud/pool=worker-1 \
    --node-labels=worker.gardener.cloud/system-components=true \
    --container-runtime-endpoint=unix:///run/containerd/containerd.sock \
    --runtime-cgroups=/system.slice/containerd.service \
    --container-runtime=remote \
    --v=2 \
    $KUBELET_EXTRA_ARGS \
    --cloud-provider=external
ExecStartPre=/bin/sh -c 'hostnamectl set-hostname $(wget -q -O- --header "Metadata-Flavor: Google" http://metadata.google.internal/computeMetadata/v1/instance/hostname | cut -d '.' -f 1)'

# /etc/systemd/system/kubelet.service.d/10-configure-cgroup-driver.conf
[Service]
ExecStartPre=/opt/gardener/bin/kubelet_cgroup_driver.sh
MrBatschner commented 1 year ago

From the logs we can deduce that:

MrBatschner commented 1 year ago

The script containerd_cgroup_driver.sh already contains code to trigger a kubelet restart in case the cgroup configuration changed.

The problem with the race is: we are only checking if kubelet.service is active (this - and reloading - are the only conditions when systemctl is-active would return with 0, ref). In the log above, we can find this:

Aug 10 12:48:20 shoot--project--cluster containerd_cgroup_driver.sh[3041]: /run/containerd/io.containerd.runtime.v2.task/k8s.io does not exists - no tasks in k8s.io namespace
Aug 10 12:48:20 shoot--project--cluster containerd_cgroup_driver.sh[3041]: Configuring containerd cgroup driver to systemd
Aug 10 12:48:20 shoot--project--cluster containerd_cgroup_driver.sh[3063]: activating

This last line is the output of systemctl is-active kubelet.service and at this very moment, it is in status activating - which would result in return code 3 (ref).

Things to check: