Closed MrBatschner closed 10 months ago
Problem was investigated and reproduced. It leads down to a probable race. The following things can be found in the logs.
# journalctl -u containerd.service
Aug 10 12:47:54 shoot--project--cluster systemd[1]: Starting containerd container runtime...
Aug 10 12:47:54 shoot--project--cluster containerd[632]: time="2023-08-10T12:47:54.256491336Z" level=info msg="starting containerd" revision="1.6.20~ds1-1gardenlinux3" version="1.6.20~ds1"
...
Aug 10 12:47:54 shoot--project--cluster systemd[1]: Started containerd container runtime.
Aug 10 12:47:54 shoot--project--cluster containerd[632]: time="2023-08-10T12:47:54.338506007Z" level=info msg="containerd successfully booted in 0.089254s"
...
Aug 10 12:48:09 shoot--project--cluster containerd[632]: time="2023-08-10T12:48:09.099968316Z" level=warning msg="cleanup warnings time=\"2023-08-10T12:48:09Z\" level=info msg=\"starting signal loop\" namespace=moby pid=1025 runtime=io.containerd.runc.v2\n"
Aug 10 12:48:19 shoot--project--cluster systemd[1]: Stopping containerd container runtime...
Aug 10 12:48:19 shoot--project--cluster containerd[632]: time="2023-08-10T12:48:19.645043636Z" level=info msg="Stop CRI service"
Aug 10 12:48:19 shoot--project--cluster systemd[1]: containerd.service: Deactivated successfully.
Aug 10 12:48:19 shoot--project--cluster systemd[1]: Stopped containerd container runtime.
Aug 10 12:48:20 shoot--project--cluster systemd[1]: Starting containerd container runtime...
Aug 10 12:48:20 shoot--project--cluster containerd_cgroup_driver.sh[3041]: /run/containerd/io.containerd.runtime.v2.task/k8s.io does not exists - no tasks in k8s.io namespace
Aug 10 12:48:20 shoot--project--cluster containerd_cgroup_driver.sh[3041]: Configuring containerd cgroup driver to systemd
Aug 10 12:48:20 shoot--project--cluster containerd_cgroup_driver.sh[3063]: activating
Aug 10 12:48:20 shoot--project--cluster containerd[3064]: time="2023-08-10T12:48:20.488933711Z" level=info msg="starting containerd" revision="1.6.20~ds1-1gardenlinux3" version="1.6.20~ds1"
...
Aug 10 12:48:20 shoot--project--cluster systemd[1]: Started containerd container runtime.
Aug 10 12:48:20 shoot--project--cluster containerd[3064]: time="2023-08-10T12:48:20.565626487Z" level=info msg="containerd successfully booted in 0.082965s"
...
# journalctl -u kubelet.service
Aug 10 12:48:19 shoot--project--cluster systemd[1]: Starting kubelet daemon...
Aug 10 12:48:19 shoot--project--cluster copy-kubernetes-binary.sh[2949]: Checking whether to copy new kubelet binary from hyperkube image to /opt/bin folder...
Aug 10 12:48:19 shoot--project--cluster copy-kubernetes-binary.sh[2949]: kubelet binary in /opt/bin is outdated (image used for last copy: ). Need to update it to eu.gcr.io/sap-se-gcr-k8s-public/eu_gcr_io/gardener-project/hyperkube@sha256:24a65f4e5f4a508f8c164cec9bd3dbde5fd263dd5816ed819f0d8d5210ffe8d5.
Aug 10 12:48:20 shoot--project--cluster kubelet_cgroup_driver.sh[3038]: kubelet and containerd are configured with the same cgroup driver (cgroupfs) - nothing to do
Aug 10 12:48:20 shoot--project--cluster systemd[1]: Started kubelet daemon.
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: Flag --container-runtime has been deprecated, will be removed in 1.27 as the only valid value is 'remote'
...
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.587896 3075 flags.go:64] FLAG: --address="0.0.0.0"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.587916 3075 flags.go:64] FLAG: --allowed-unsafe-sysctls="[]"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.587938 3075 flags.go:64] FLAG: --anonymous-auth="true"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.587947 3075 flags.go:64] FLAG: --application-metrics-count-limit="100"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.587956 3075 flags.go:64] FLAG: --authentication-token-webhook="false"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.587964 3075 flags.go:64] FLAG: --authentication-token-webhook-cache-ttl="2m0s"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.587977 3075 flags.go:64] FLAG: --authorization-mode="AlwaysAllow"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.587986 3075 flags.go:64] FLAG: --authorization-webhook-cache-authorized-ttl="5m0s"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.587993 3075 flags.go:64] FLAG: --authorization-webhook-cache-unauthorized-ttl="30s"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.587999 3075 flags.go:64] FLAG: --azure-container-registry-config=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588005 3075 flags.go:64] FLAG: --boot-id-file="/proc/sys/kernel/random/boot_id"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588013 3075 flags.go:64] FLAG: --bootstrap-kubeconfig="/var/lib/kubelet/kubeconfig-bootstrap"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588020 3075 flags.go:64] FLAG: --cert-dir="/var/lib/kubelet/pki"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588034 3075 flags.go:64] FLAG: --cgroup-driver="cgroupfs"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588043 3075 flags.go:64] FLAG: --cgroup-root=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588049 3075 flags.go:64] FLAG: --cgroups-per-qos="true"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588056 3075 flags.go:64] FLAG: --client-ca-file=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588062 3075 flags.go:64] FLAG: --cloud-config=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588068 3075 flags.go:64] FLAG: --cloud-provider="external"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588092 3075 flags.go:64] FLAG: --cluster-dns="[]"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588114 3075 flags.go:64] FLAG: --cluster-domain=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588124 3075 flags.go:64] FLAG: --config="/var/lib/kubelet/config/kubelet"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588131 3075 flags.go:64] FLAG: --container-hints="/etc/cadvisor/container_hints.json"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588139 3075 flags.go:64] FLAG: --container-log-max-files="5"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588161 3075 flags.go:64] FLAG: --container-log-max-size="10Mi"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588167 3075 flags.go:64] FLAG: --container-runtime="remote"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588174 3075 flags.go:64] FLAG: --container-runtime-endpoint="unix:///run/containerd/containerd.sock"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588181 3075 flags.go:64] FLAG: --containerd="/run/containerd/containerd.sock"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588193 3075 flags.go:64] FLAG: --containerd-namespace="k8s.io"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588202 3075 flags.go:64] FLAG: --contention-profiling="false"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588209 3075 flags.go:64] FLAG: --cpu-cfs-quota="true"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588216 3075 flags.go:64] FLAG: --cpu-cfs-quota-period="100ms"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588223 3075 flags.go:64] FLAG: --cpu-manager-policy="none"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588229 3075 flags.go:64] FLAG: --cpu-manager-policy-options=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588238 3075 flags.go:64] FLAG: --cpu-manager-reconcile-period="10s"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588249 3075 flags.go:64] FLAG: --enable-controller-attach-detach="true"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588255 3075 flags.go:64] FLAG: --enable-debugging-handlers="true"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588262 3075 flags.go:64] FLAG: --enable-load-reader="false"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588269 3075 flags.go:64] FLAG: --enable-server="true"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588275 3075 flags.go:64] FLAG: --enforce-node-allocatable="[pods]"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588294 3075 flags.go:64] FLAG: --event-burst="10"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588300 3075 flags.go:64] FLAG: --event-qps="5"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588311 3075 flags.go:64] FLAG: --event-storage-age-limit="default=0"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588318 3075 flags.go:64] FLAG: --event-storage-event-limit="default=0"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588325 3075 flags.go:64] FLAG: --eviction-hard=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588336 3075 flags.go:64] FLAG: --eviction-max-pod-grace-period="0"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588342 3075 flags.go:64] FLAG: --eviction-minimum-reclaim=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588348 3075 flags.go:64] FLAG: --eviction-pressure-transition-period="5m0s"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588355 3075 flags.go:64] FLAG: --eviction-soft=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588372 3075 flags.go:64] FLAG: --eviction-soft-grace-period=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588379 3075 flags.go:64] FLAG: --exit-on-lock-contention="false"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588395 3075 flags.go:64] FLAG: --experimental-allocatable-ignore-eviction="false"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588403 3075 flags.go:64] FLAG: --experimental-mounter-path=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588409 3075 flags.go:64] FLAG: --fail-swap-on="true"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588416 3075 flags.go:64] FLAG: --feature-gates=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588424 3075 flags.go:64] FLAG: --file-check-frequency="20s"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588434 3075 flags.go:64] FLAG: --global-housekeeping-interval="1m0s"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588440 3075 flags.go:64] FLAG: --hairpin-mode="promiscuous-bridge"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588448 3075 flags.go:64] FLAG: --healthz-bind-address="127.0.0.1"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588455 3075 flags.go:64] FLAG: --healthz-port="10248"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588462 3075 flags.go:64] FLAG: --help="false"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588487 3075 flags.go:64] FLAG: --hostname-override=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588496 3075 flags.go:64] FLAG: --housekeeping-interval="10s"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588504 3075 flags.go:64] FLAG: --http-check-frequency="20s"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588516 3075 flags.go:64] FLAG: --image-credential-provider-bin-dir=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588522 3075 flags.go:64] FLAG: --image-credential-provider-config=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588528 3075 flags.go:64] FLAG: --image-gc-high-threshold="85"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588535 3075 flags.go:64] FLAG: --image-gc-low-threshold="80"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588541 3075 flags.go:64] FLAG: --image-service-endpoint=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588557 3075 flags.go:64] FLAG: --iptables-drop-bit="15"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588565 3075 flags.go:64] FLAG: --iptables-masquerade-bit="14"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588575 3075 flags.go:64] FLAG: --keep-terminated-pod-volumes="false"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588582 3075 flags.go:64] FLAG: --kernel-memcg-notification="false"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588588 3075 flags.go:64] FLAG: --kube-api-burst="10"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588595 3075 flags.go:64] FLAG: --kube-api-content-type="application/vnd.kubernetes.protobuf"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588601 3075 flags.go:64] FLAG: --kube-api-qps="5"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588609 3075 flags.go:64] FLAG: --kube-reserved=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588618 3075 flags.go:64] FLAG: --kube-reserved-cgroup=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588628 3075 flags.go:64] FLAG: --kubeconfig="/var/lib/kubelet/kubeconfig-real"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588635 3075 flags.go:64] FLAG: --kubelet-cgroups=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588641 3075 flags.go:64] FLAG: --local-storage-capacity-isolation="true"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588648 3075 flags.go:64] FLAG: --lock-file=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588654 3075 flags.go:64] FLAG: --log-cadvisor-usage="false"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588660 3075 flags.go:64] FLAG: --log-flush-frequency="5s"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588667 3075 flags.go:64] FLAG: --log-json-info-buffer-size="0"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588681 3075 flags.go:64] FLAG: --log-json-split-stream="false"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588687 3075 flags.go:64] FLAG: --logging-format="text"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588704 3075 flags.go:64] FLAG: --machine-id-file="/etc/machine-id,/var/lib/dbus/machine-id"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588713 3075 flags.go:64] FLAG: --make-iptables-util-chains="true"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588720 3075 flags.go:64] FLAG: --manifest-url=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588728 3075 flags.go:64] FLAG: --manifest-url-header=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588737 3075 flags.go:64] FLAG: --master-service-namespace="default"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588747 3075 flags.go:64] FLAG: --max-open-files="1000000"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588757 3075 flags.go:64] FLAG: --max-pods="110"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588764 3075 flags.go:64] FLAG: --maximum-dead-containers="-1"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588770 3075 flags.go:64] FLAG: --maximum-dead-containers-per-container="1"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588776 3075 flags.go:64] FLAG: --memory-manager-policy="None"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588782 3075 flags.go:64] FLAG: --minimum-container-ttl-duration="0s"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588789 3075 flags.go:64] FLAG: --minimum-image-ttl-duration="2m0s"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588795 3075 flags.go:64] FLAG: --node-ip=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588805 3075 flags.go:64] FLAG: --node-labels="kubernetes.io/arch=amd64,networking.gardener.cloud/node-local-dns-enabled=true,node.kubernetes.io/role=node,worker.garden.sapcloud.io/group=worker-1,worker.gardener.cloud/cri-name=containerd,worker.gardener.cloud/image-name=gardenlinux,worker.gardener.cloud/image-version=934.9.0,worker.gardener.cloud/kubernetes-version=1.26.7,worker.gardener.cloud/pool=worker-1,worker.gardener.cloud/system-components=true"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588869 3075 flags.go:64] FLAG: --node-status-max-images="50"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588876 3075 flags.go:64] FLAG: --node-status-update-frequency="10s"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588883 3075 flags.go:64] FLAG: --oom-score-adj="-999"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588889 3075 flags.go:64] FLAG: --pod-cidr=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588906 3075 flags.go:64] FLAG: --pod-infra-container-image="registry.k8s.io/pause:3.9"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588917 3075 flags.go:64] FLAG: --pod-manifest-path=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588928 3075 flags.go:64] FLAG: --pod-max-pids="-1"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588934 3075 flags.go:64] FLAG: --pods-per-core="0"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588941 3075 flags.go:64] FLAG: --port="10250"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588950 3075 flags.go:64] FLAG: --protect-kernel-defaults="false"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588957 3075 flags.go:64] FLAG: --provider-id=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588963 3075 flags.go:64] FLAG: --qos-reserved=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588970 3075 flags.go:64] FLAG: --read-only-port="10255"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588979 3075 flags.go:64] FLAG: --register-node="true"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588986 3075 flags.go:64] FLAG: --register-schedulable="true"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.588992 3075 flags.go:64] FLAG: --register-with-taints=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589000 3075 flags.go:64] FLAG: --registry-burst="10"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589006 3075 flags.go:64] FLAG: --registry-qps="5"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589012 3075 flags.go:64] FLAG: --reserved-cpus=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589018 3075 flags.go:64] FLAG: --reserved-memory=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589030 3075 flags.go:64] FLAG: --resolv-conf="/etc/resolv.conf"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589039 3075 flags.go:64] FLAG: --root-dir="/var/lib/kubelet"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589056 3075 flags.go:64] FLAG: --rotate-certificates="false"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589064 3075 flags.go:64] FLAG: --rotate-server-certificates="false"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589071 3075 flags.go:64] FLAG: --runonce="false"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589080 3075 flags.go:64] FLAG: --runtime-cgroups="/system.slice/containerd.service"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589087 3075 flags.go:64] FLAG: --runtime-request-timeout="2m0s"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589097 3075 flags.go:64] FLAG: --seccomp-default="false"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589104 3075 flags.go:64] FLAG: --serialize-image-pulls="true"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589115 3075 flags.go:64] FLAG: --storage-driver-buffer-duration="1m0s"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589122 3075 flags.go:64] FLAG: --storage-driver-db="cadvisor"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589129 3075 flags.go:64] FLAG: --storage-driver-host="localhost:8086"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589135 3075 flags.go:64] FLAG: --storage-driver-password="root"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589141 3075 flags.go:64] FLAG: --storage-driver-secure="false"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589148 3075 flags.go:64] FLAG: --storage-driver-table="stats"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589159 3075 flags.go:64] FLAG: --storage-driver-user="root"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589165 3075 flags.go:64] FLAG: --streaming-connection-idle-timeout="4h0m0s"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589174 3075 flags.go:64] FLAG: --sync-frequency="1m0s"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589180 3075 flags.go:64] FLAG: --system-cgroups=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589186 3075 flags.go:64] FLAG: --system-reserved=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589203 3075 flags.go:64] FLAG: --system-reserved-cgroup=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589213 3075 flags.go:64] FLAG: --tls-cert-file=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589223 3075 flags.go:64] FLAG: --tls-cipher-suites="[]"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589246 3075 flags.go:64] FLAG: --tls-min-version=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589252 3075 flags.go:64] FLAG: --tls-private-key-file=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589258 3075 flags.go:64] FLAG: --topology-manager-policy="none"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589265 3075 flags.go:64] FLAG: --topology-manager-policy-options=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589271 3075 flags.go:64] FLAG: --topology-manager-scope="container"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589278 3075 flags.go:64] FLAG: --v="2"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589291 3075 flags.go:64] FLAG: --version="false"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589301 3075 flags.go:64] FLAG: --vmodule=""
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589309 3075 flags.go:64] FLAG: --volume-plugin-dir="/usr/libexec/kubernetes/kubelet-plugins/volume/exec/"
Aug 10 12:48:20 shoot--project--cluster kubelet[3075]: I0810 12:48:20.589317 3075 flags.go:64] FLAG: --volume-stats-agg-period="1m0s"
...
<< no further occurence and/or execution of kubelet_cgroup_driver.sh can be found in the logs >>
...
Aug 10 12:48:24 shoot--project--cluster kubelet[3075]: E0810 12:48:24.012528 3075 pod_workers.go:965] "Error syncing pod, skipping" err="network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized" pod="kube-system/network-problem-detector-pod-hbxzb" podUID=19e95587-53db-436c-836a-846e4aef172c
Aug 10 12:48:24 shoot--project--cluster kubelet[3075]: E0810 12:48:24.578677 3075 remote_runtime.go:176] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: expected cgroupsPath to be of format \"slice:prefix:name\" for systemd cgroups, got \"/kubepods/burstable/pode92075c7-ad42-431c-9d4f-ead1a33d586a/e2d492398555eebb3e966f027acc5b5f28659192be361831c0ee20b1fbed6fe3\" instead: unknown"
Aug 10 12:48:24 shoot--project--cluster kubelet[3075]: E0810 12:48:24.578769 3075 kuberuntime_sandbox.go:72] "Failed to create sandbox for pod" err="rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: expected cgroupsPath to be of format \"slice:prefix:name\" for systemd cgroups, got \"/kubepods/burstable/pode92075c7-ad42-431c-9d4f-ead1a33d586a/e2d492398555eebb3e966f027acc5b5f28659192be361831c0ee20b1fbed6fe3\" instead: unknown" pod="kube-system/calico-node-mnhkd"
Aug 10 12:48:24 shoot--project--cluster kubelet[3075]: E0810 12:48:24.578813 3075 kuberuntime_manager.go:782] "CreatePodSandbox for pod failed" err="rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: expected cgroupsPath to be of format \"slice:prefix:name\" for systemd cgroups, got \"/kubepods/burstable/pode92075c7-ad42-431c-9d4f-ead1a33d586a/e2d492398555eebb3e966f027acc5b5f28659192be361831c0ee20b1fbed6fe3\" instead: unknown" pod="kube-system/calico-node-mnhkd"
Aug 10 12:48:24 shoot--project--cluster kubelet[3075]: E0810 12:48:24.578895 3075 pod_workers.go:965] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"calico-node-mnhkd_kube-system(e92075c7-ad42-431c-9d4f-ead1a33d586a)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"calico-node-mnhkd_kube-system(e92075c7-ad42-431c-9d4f-ead1a33d586a)\\\": rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: expected cgroupsPath to be of format \\\"slice:prefix:name\\\" for systemd cgroups, got \\\"/kubepods/burstable/pode92075c7-ad42-431c-9d4f-ead1a33d586a/e2d492398555eebb3e966f027acc5b5f28659192be361831c0ee20b1fbed6fe3\\\" instead: unknown\"" pod="kube-system/calico-node-mnhkd" podUID=e92075c7-ad42-431c-9d4f-ead1a33d586a
Aug 10 12:48:24 shoot--project--cluster kubelet[3075]: E0810 12:48:24.580107 3075 remote_runtime.go:176] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: expected cgroupsPath to be of format \"slice:prefix:name\" for systemd cgroups, got \"/kubepods/burstable/pod965ebc20-3c9b-4956-9eea-0515f794f3fb/cfdc0497e8d169ff3a7d6566e6aa1c9b1287c091bf2f427ed3a56a729489bebd\" instead: unknown"
Aug 10 12:48:24 shoot--project--cluster kubelet[3075]: E0810 12:48:24.580189 3075 kuberuntime_sandbox.go:72] "Failed to create sandbox for pod" err="rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: expected cgroupsPath to be of format \"slice:prefix:name\" for systemd cgroups, got \"/kubepods/burstable/pod965ebc20-3c9b-4956-9eea-0515f794f3fb/cfdc0497e8d169ff3a7d6566e6aa1c9b1287c091bf2f427ed3a56a729489bebd\" instead: unknown" pod="kube-system/egress-filter-applier-jd57z"
Aug 10 12:48:24 shoot--project--cluster kubelet[3075]: E0810 12:48:24.580228 3075 kuberuntime_manager.go:782] "CreatePodSandbox for pod failed" err="rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: expected cgroupsPath to be of format \"slice:prefix:name\" for systemd cgroups, got \"/kubepods/burstable/pod965ebc20-3c9b-4956-9eea-0515f794f3fb/cfdc0497e8d169ff3a7d6566e6aa1c9b1287c091bf2f427ed3a56a729489bebd\" instead: unknown" pod="kube-system/egress-filter-applier-jd57z"
Aug 10 12:48:24 shoot--project--cluster kubelet[3075]: E0810 12:48:24.580301 3075 pod_workers.go:965] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"egress-filter-applier-jd57z_kube-system(965ebc20-3c9b-4956-9eea-0515f794f3fb)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"egress-filter-applier-jd57z_kube-system(965ebc20-3c9b-4956-9eea-0515f794f3fb)\\\": rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: expected cgroupsPath to be of format \\\"slice:prefix:name\\\" for systemd cgroups, got \\\"/kubepods/burstable/pod965ebc20-3c9b-4956-9eea-0515f794f3fb/cfdc0497e8d169ff3a7d6566e6aa1c9b1287c091bf2f427ed3a56a729489bebd\\\" instead: unknown\"" pod="kube-system/egress-filter-applier-jd57z" podUID=965ebc20-3c9b-4956-9eea-0515f794f3fb
...
$ systemctl cat containerd.service
# /lib/systemd/system/containerd.service
# Copyright The containerd Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
[Unit]
Description=containerd container runtime
Documentation=https://containerd.io
After=network.target local-fs.target
[Service]
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/bin/containerd
Type=notify
Delegate=yes
KillMode=process
Restart=always
RestartSec=5
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNPROC=infinity
LimitCORE=infinity
LimitNOFILE=infinity
# Comment TasksMax if your systemd version does not supports it.
# Only systemd 226 and above support this version.
TasksMax=infinity
OOMScoreAdjust=-999
[Install]
WantedBy=multi-user.target
# /etc/systemd/system/containerd.service.d/10-configure-cgroup-driver.conf
[Service]
ExecStartPre=/opt/gardener/bin/containerd_cgroup_driver.sh
# /etc/systemd/system/containerd.service.d/10-require-containerd-initializer.conf
[Unit]
After=containerd-initializer.service
Requires=containerd-initializer.service
# /etc/systemd/system/containerd.service.d/11-exec_config.conf
[Service]
ExecStart=
ExecStart=/usr/bin/containerd --config=/etc/containerd/config.toml
# /etc/systemd/system/containerd.service.d/30-env_config.conf
[Service]
Environment="PATH=/var/bin/containerruntimes:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
# /etc/systemd/system/containerd.service.d/override.conf
[Service]
LimitMEMLOCK=67108864
$ systemctl cat kubelet.service
# /etc/systemd/system/kubelet.service
[Unit]
Description=kubelet daemon
Documentation=https://kubernetes.io/docs/admin/kubelet
After=containerd.service
[Install]
WantedBy=multi-user.target
[Service]
Restart=always
RestartSec=5
EnvironmentFile=/etc/environment
EnvironmentFile=-/var/lib/kubelet/extra_args
ExecStartPre=/var/lib/kubelet/copy-kubernetes-binary.sh kubelet
ExecStart=/opt/bin/kubelet --bootstrap-kubeconfig=/var/lib/kubelet/kubeconfig-bootstrap \
--config=/var/lib/kubelet/config/kubelet \
--kubeconfig=/var/lib/kubelet/kubeconfig-real \
--node-labels=worker.gardener.cloud/kubernetes-version=1.26.7 \
--node-labels=kubernetes.io/arch=amd64 \
--node-labels=networking.gardener.cloud/node-local-dns-enabled=true \
--node-labels=node.kubernetes.io/role=node \
--node-labels=worker.garden.sapcloud.io/group=worker-1 \
--node-labels=worker.gardener.cloud/cri-name=containerd \
--node-labels=worker.gardener.cloud/image-name=gardenlinux \
--node-labels=worker.gardener.cloud/image-version=934.9.0 \
--node-labels=worker.gardener.cloud/pool=worker-1 \
--node-labels=worker.gardener.cloud/system-components=true \
--container-runtime-endpoint=unix:///run/containerd/containerd.sock \
--runtime-cgroups=/system.slice/containerd.service \
--container-runtime=remote \
--v=2 \
$KUBELET_EXTRA_ARGS \
--cloud-provider=external
ExecStartPre=/bin/sh -c 'hostnamectl set-hostname $(wget -q -O- --header "Metadata-Flavor: Google" http://metadata.google.internal/computeMetadata/v1/instance/hostname | cut -d '.' -f 1)'
# /etc/systemd/system/kubelet.service.d/10-configure-cgroup-driver.conf
[Service]
ExecStartPre=/opt/gardener/bin/kubelet_cgroup_driver.sh
From the logs we can deduce that:
cgroupfs
cgroupfs
- hence, it configures the kubelet to run with cgroupfs
systemd
cgroupfs
The script containerd_cgroup_driver.sh
already contains code to trigger a kubelet restart in case the cgroup configuration changed.
The problem with the race is: we are only checking if kubelet.service
is active
(this - and reloading
- are the only conditions when systemctl is-active
would return with 0
, ref). In the log above, we can find this:
Aug 10 12:48:20 shoot--project--cluster containerd_cgroup_driver.sh[3041]: /run/containerd/io.containerd.runtime.v2.task/k8s.io does not exists - no tasks in k8s.io namespace
Aug 10 12:48:20 shoot--project--cluster containerd_cgroup_driver.sh[3041]: Configuring containerd cgroup driver to systemd
Aug 10 12:48:20 shoot--project--cluster containerd_cgroup_driver.sh[3063]: activating
This last line is the output of systemctl is-active kubelet.service
and at this very moment, it is in status activating
- which would result in return code 3
(ref).
Things to check:
PartOf=
be used for the kubelet to have it automatically restarted with containerd by systemd - and if so, would it have any side-effects?
How to categorize this issue?
/area os /kind bug /os garden-linux
What happened:
The scripts introduced in PR #82 and PR #85 are supposed to configure containerd's cgroup driver according to the cgroup version of the underlying host and to make sure that the kubelet is configured with the corresponding cgroup driver. Both reconfigurations happen by running a script in the respectives component's
ExecStartPre=
before it comes up.Since these changes were released to Gardener landscapes, they proved to be reliable for most of the time. However, we came across at least one instance of a new node with Garden Linux 934.7 and hence with cgroup v2 being scheduled for an existing worker pool which failed to join the cluster due to an inconsistent cgroup driver configuration: containerd ran with the
systemd
cgroup driver but the kubelet still ran with thecgroupfs
cgroup driver.From the logs of the internal support ticket:
These log message are a strong pointer to the kubelet creating a classic cgroupfs hierarchy which containerd then has not clue about what to do with.
What you expected to happen:
Containerd and kubelet are always running with the same cgroup driver and log message like the ones given above do not occur.
How to reproduce it (as minimally and precisely as possible):
Steps to reproduce need to be investigated. It seems that this is caused by a timing issue or a race condition.