Open smileusd opened 2 years ago
Friendly reminder, pay attention to hide the docker registry username and password @smileusd
@smileusd i don't really like when people post "i'm having this problem too", but in this case i feel i need to - as i have this problem since many weeks and this is the first reporting i've seen..
i wasn't sure if this was a containerd problem or a linux problem..
in my case, I'm having this problem with k3s on alpine linux, so no systemd involved.. it seems, containerd is looking for cgroup stuff in the wrong place?
so when i try to exec into a running container:
host:/# crictl exec -ti a1d258cd01eb1 bash
FATA[0000] execing command in container: Internal error occurred: error executing command in container: failed to exec in container: failed to start exec "8eabd87df6d9b37b935e7d5b9c439ddab304f4013a46289915b280ca6bc23655": OCI runtime exec failed: exec failed: unable to start container process: error adding pid 5788 to cgroups: failed to write 5788: open /sys/fs/unified/kubepods/burstable/pod9f0a88d8-fb33-402c-ae50-06bfeca4dd40/a1d258cd01eb125761a564511b2fd5eff0ff0aabf6356ee6e9dbd93ece40f673/cgroup.procs: no such file or directory: unknown
my cgroups are mounted as such:
host:/# mount | grep cgroup
cgroup on /sys/fs/cgroup type cgroup (rw,relatime,cpuset,cpu,cpuacct,blkio,memory,devices,freezer,net_cls,perf_event,net_prio,pids)
openrc on /sys/fs/cgroup/openrc type cgroup (rw,nosuid,nodev,noexec,relatime,release_agent=/lib/rc/sh/cgroup-release-agent.sh,name=openrc)
none on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate)
---> it just seems that it is on /sys/fs/cgroup/unified
- but containerd is looking for /sys/fs/unified/
?
crictl -v
crictl version v1.25.0-k3s1
w4:/home/dn# crictl info
{
"status": {
"conditions": [
{
"type": "RuntimeReady",
"status": true,
"reason": "",
"message": ""
},
{
"type": "NetworkReady",
"status": true,
"reason": "",
"message": ""
}
]
},
"cniconfig": {
"PluginDirs": [
"/var/lib/rancher/k3s/data/a6de074690df59f81a5341041635a72720bd8ab6f08a15e362dfe33b896b4de9/bin"
],
"PluginConfDir": "/var/lib/rancher/k3s/agent/etc/cni/net.d",
"PluginMaxConfNum": 1,
"Prefix": "eth",
"Networks": [
{
"Config": {
"Name": "cni-loopback",
"CNIVersion": "0.3.1",
"Plugins": [
{
"Network": {
"type": "loopback",
"ipam": {},
"dns": {}
},
"Source": "{\"type\":\"loopback\"}"
}
],
"Source": "{\n\"cniVersion\": \"0.3.1\",\n\"name\": \"cni-loopback\",\n\"plugins\": [{\n \"type\": \"loopback\"\n}]\n}"
},
"IFName": "lo"
},
{
"Config": {
"Name": "cbr0",
"CNIVersion": "1.0.0",
"Plugins": [
{
"Network": {
"type": "flannel",
"ipam": {},
"dns": {}
},
"Source": "{\"delegate\":{\"forceAddress\":true,\"hairpinMode\":true,\"isDefaultGateway\":true},\"type\":\"flannel\"}"
},
{
"Network": {
"type": "portmap",
"capabilities": {
"portMappings": true
},
"ipam": {},
"dns": {}
},
"Source": "{\"capabilities\":{\"portMappings\":true},\"type\":\"portmap\"}"
}
],
"Source": "{\n \"name\":\"cbr0\",\n \"cniVersion\":\"1.0.0\",\n \"plugins\":[\n {\n \"type\":\"flannel\",\n \"delegate\":{\n \"hairpinMode\":true,\n \"forceAddress\":true,\n \"isDefaultGateway\":true\n }\n },\n {\n \"type\":\"portmap\",\n \"capabilities\":{\n \"portMappings\":true\n }\n }\n ]\n}\n"
},
"IFName": "eth0"
}
]
},
"config": {
"containerd": {
"snapshotter": "overlayfs",
"defaultRuntimeName": "runc",
"defaultRuntime": {
"runtimeType": "",
"runtimePath": "",
"runtimeEngine": "",
"PodAnnotations": null,
"ContainerAnnotations": null,
"runtimeRoot": "",
"options": null,
"privileged_without_host_devices": false,
"baseRuntimeSpec": "",
"cniConfDir": "",
"cniMaxConfNum": 0
},
"untrustedWorkloadRuntime": {
"runtimeType": "",
"runtimePath": "",
"runtimeEngine": "",
"PodAnnotations": null,
"ContainerAnnotations": null,
"runtimeRoot": "",
"options": null,
"privileged_without_host_devices": false,
"baseRuntimeSpec": "",
"cniConfDir": "",
"cniMaxConfNum": 0
},
"runtimes": {
"runc": {
"runtimeType": "io.containerd.runc.v2",
"runtimePath": "",
"runtimeEngine": "",
"PodAnnotations": null,
"ContainerAnnotations": null,
"runtimeRoot": "",
"options": {
"SystemdCgroup": false
},
"privileged_without_host_devices": false,
"baseRuntimeSpec": "",
"cniConfDir": "",
"cniMaxConfNum": 0
}
},
"noPivot": false,
"disableSnapshotAnnotations": true,
"discardUnpackedLayers": false,
"ignoreRdtNotEnabledErrors": false
},
"cni": {
"binDir": "/var/lib/rancher/k3s/data/a6de074690df59f81a5341041635a72720bd8ab6f08a15e362dfe33b896b4de9/bin",
"confDir": "/var/lib/rancher/k3s/agent/etc/cni/net.d",
"maxConfNum": 1,
"confTemplate": "",
"ipPref": ""
},
"registry": {
"configPath": "",
"mirrors": {
"registry.apps.xbe": {
"endpoint": [
"http://registry.apps.xbe"
],
"rewrite": null
}
},
"configs": null,
"auths": null,
"headers": null
},
"imageDecryption": {
"keyModel": "node"
},
"disableTCPService": true,
"streamServerAddress": "127.0.0.1",
"streamServerPort": "10010",
"streamIdleTimeout": "4h0m0s",
"enableSelinux": false,
"selinuxCategoryRange": 1024,
"sandboxImage": "rancher/mirrored-pause:3.6",
"statsCollectPeriod": 10,
"systemdCgroup": false,
"enableTLSStreaming": false,
"x509KeyPairStreaming": {
"tlsCertFile": "",
"tlsKeyFile": ""
},
"maxContainerLogSize": 16384,
"disableCgroup": false,
"disableApparmor": false,
"restrictOOMScoreAdj": false,
"maxConcurrentDownloads": 3,
"disableProcMount": false,
"unsetSeccompProfile": "",
"tolerateMissingHugetlbController": true,
"disableHugetlbController": true,
"device_ownership_from_security_context": false,
"ignoreImageDefinedVolumes": false,
"netnsMountsUnderStateDir": false,
"enableUnprivilegedPorts": true,
"enableUnprivilegedICMP": true,
"containerdRootDir": "/var/lib/rancher/k3s/agent/containerd",
"containerdEndpoint": "/run/k3s/containerd/containerd.sock",
"rootDir": "/var/lib/rancher/k3s/agent/containerd/io.containerd.grpc.v1.cri",
"stateDir": "/run/k3s/containerd/io.containerd.grpc.v1.cri"
},
"golang": "go1.19.1",
"lastCNILoadStatus": "OK",
"lastCNILoadStatus.default": "OK"
}
uname -a
Linux w4 5.15.72-0-rpi4 #1-Alpine SMP PREEMPT Mon Oct 10 13:00:16 UTC 2022 aarch64 Linux
Description
We use containerd as the container runtime and use systemd as the cgroup v1 driver. After node upgrade on our cluster nodes, we found some pods can not be exec login because cgroup directory not found:
From the system log we find systemd delete these containers when containerd upgrade:
But i not see the containerd relative log of container deletion. Due to only systemd directory (unified and systemd) lost, so i also report an issue on systemd https://github.com/systemd/systemd/issues/24858. Not sure what happened at that time, but only thing we can ensure is the time is happened on containerd upgrade.
Steps to reproduce the issue
Now i have not reproduce the issue.
Describe the results you received and expected
Expect cgroup entry still exist.
What version of containerd are you using?
v1.6.6
Any other relevant information
runc version 1.1.2 systemd 245 ubuntu 20.04.4 linux kernel 5.4.0-96-generic
Show configuration if it is related to CRI plugin.