Closed steven-sheehy closed 5 years ago
@steven-sheehy What is your systemd service file?
This usually happens because some processes in the container are leaked holding the container io. We fixed a bug related to this in 1.2.1, and it shouldn't happen again.
However, if you are not configuring systemd correctly, it may move container processes out of its original cgroup. In that case, some cleanup logic of runc will be broken, which may cause process leakage.
@Random-Liu It's the same one that comes with the release tarball.
$ cat /etc/systemd/system/containerd.service
[Unit]
Description=containerd container runtime
Documentation=https://containerd.io
After=network.target
[Service]
ExecStartPre=/sbin/modprobe overlay
ExecStart=/usr/local/bin/containerd
Restart=always
RestartSec=5
Delegate=yes
KillMode=process
OOMScoreAdjust=-999
LimitNOFILE=1048576
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNPROC=infinity
LimitCORE=infinity
[Install]
WantedBy=multi-user.target
My kubelet args if it helps:
/usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-log-max-files=2 --container-log-max-size=100Mi --container-runtime=remote --container-runtime-endpoint=unix:///var/run/containerd/containerd.sock --runtime-request-timeout=15m
systemctl status shows all the processes in /system.slice/containerd.service cgroup
$ systemctl status containerd
● containerd.service - containerd container runtime
Loaded: loaded (/etc/systemd/system/containerd.service; enabled; vendor preset: enabled)
Active: active (running) since Thu 2019-01-03 19:54:41 UTC; 2h 1min ago
Docs: https://containerd.io
Process: 242537 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
Main PID: 242538 (containerd)
Tasks: 296 (limit: 4915)
CGroup: /system.slice/containerd.service
├─ 11093 sleep 5
├─ 95283 containerd-shim -namespace k8s.io -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/k8s.io/fbd8d3488cb4fb18706df86d1aa94c2a2fff737b16ffb44062576191c974f1cf -address /run/containerd/containerd.sock -containerd-binary /usr/local/bin/container
├─ 95300 /pause
├─ 95374 containerd-shim -namespace k8s.io -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/k8s.io/cafb2a6f981903dff30007c3329b9f4c29b452bbe6ab4fddc0738adee3c6cd28 -address /run/containerd/containerd.sock -containerd-binary /usr/local/bin/container
├─ 95539 containerd-shim -namespace k8s.io -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/k8s.io/3963deae9861bd99f1fde35819ecbf8642098e03a742bd1fab7f316495645f85 -address /run/containerd/containerd.sock -containerd-binary /usr/local/bin/container
├─ 95561 /pause
├─ 95669 containerd-shim -namespace k8s.io -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/k8s.io/fab4e9c2bcadf1a652fe840cdc4a87c46d08e856ce3568bb8ed9e7f2c18ed350 -address /run/containerd/containerd.sock -containerd-binary /usr/local/bin/container
├─ 95683 containerd-shim -namespace k8s.io -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/k8s.io/e399107ecc5c9705e1fd1a84ab32be2ac369e7d4fd1f1df41a479d956903f25f -address /run/containerd/containerd.sock -containerd-binary /usr/local/bin/container
├─ 95708 /pause
├─ 96014 containerd-shim -namespace k8s.io -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/k8s.io/9d0374d05c5de01b86fb620c846b9878dd17c8ccfb6e233a847e6e373b39b362 -address /run/containerd/containerd.sock -containerd-binary /usr/local/bin/container
├─ 96032 /pause
├─ 96092 containerd-shim -namespace k8s.io -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/k8s.io/5e467b900655fa9bfe5157d329b61b9b86d8fdec5e6a18a081365df69ea633b6 -address /run/containerd/containerd.sock -containerd-binary /usr/local/bin/container
├─ 96218 containerd-shim -namespace k8s.io -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/k8s.io/264a6a17b38e19a5bcc5c7b18b078efe5b103a596d4340a67bf18cb16d77bc41 -address /run/containerd/containerd.sock -containerd-binary /usr/local/bin/container
├─ 96251 /pause
├─ 96435 containerd-shim -namespace k8s.io -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/k8s.io/013d12bc9156460a60795ef4d571dba4c96a57e3ae2f3b76beec57cacd604727 -address /run/containerd/containerd.sock -containerd-binary /usr/local/bin/container
├─ 96454 /bin/sh -c sfacctd -f /etc/pmacct/sflow_replicator.conf -F /var/run/pmacct/sfacctd_tee.pid; sfacctd -f /etc/pmacct/sflow_receiver.conf -F /var/run/pmacct/sfacctd.pid
├─ 96467 containerd-shim -namespace k8s.io -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/k8s.io/e2d1a384e7b31bfca54d60ee1e7829699a5251fc0ae3cfc1794072f0183f8533 -address /run/containerd/containerd.sock -containerd-binary /usr/local/bin/container
├─ 96489 /pause
├─ 96529 containerd-shim -namespace k8s.io -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/k8s.io/616dcf6c787c956a34f93f19f907d0d59a1b9d003afed92a2ca86bf327c54137 -address /run/containerd/containerd.sock -containerd-binary /usr/local/bin/container
├─ 96561 /pause
...
Should I be using kubelet with --cgroup-driver as systemd or cgroupfs? I think since I don't set it manually it uses cgroupfs.
@steven-sheehy If that is the case, this should be another issue.
Let me check another thing could go wrong.
@steven-sheehy
1) We need to first find which container causes this issue, and check its status.
Can you do ctr -n=k8s.io containers ls | grep -v pause
to list all containers? Containerd always return containers in a sorted order, 3 containers have been successfully loaded, so the 4th one should be the bad one. We need the container id.
2) We need to understand why the io stuck for the container after it is stopped.
If I understand correctly, there should be some processes belong to the container left running. We need to find those processes and check their state.
a) We can first check whether runc still see the container /usr/local/sbin/runc --root=/var/run/containerd/runc/k8s.io list
. If runc still sees it, we can use /usr/local/sbin/runc --root=/var/run/containerd/runc/k8s.io state ${CONTAINER_ID}
to get some debug information.
b) Check the container cgroup to see what is inside. You can run find /sys/fs/cgroup | grep ${CONTAINER_ID}
to find the cgroup, and then check cgroup.procs
to see what processes are inside.
c) If you know what the container is doing, it would be helpful to find all processes inside the container e.g. with pstree
.
With these information we should be able to find which process is left running, and figure out why it is not stopped.
If there are no processes leaked, that is even more weird...
Thanks for the detailed steps. Hopefully I got this right:
sudo ctr -n=k8s.io containers ls | grep -v pause | head -n 5
CONTAINER IMAGE RUNTIME
013d12bc9156460a60795ef4d571dba4c96a57e3ae2f3b76beec57cacd604727 registry.gitlab.com/firescope/stratis/pmacct/edge-pmacct:4.0.0-rc1 io.containerd.runtime.v1.linux
0f65efaa107732926ac247073b21ea8f012666d2f312d5726ddce96e8b4ea126 sha256:097dd2748285f27a6049caeb404bf5cbeaed8ce5c5fb71f734cf6f125abc9fe6 io.containerd.runtime.v1.linux
10b309d5d2945eee4cabe23346a56ab3e7c037a1ff21e6d5646e98b5d3fdfbf0 sha256:da86e6ba6ca197bf6bc5e9d900febd906b133eaa4750e6bed647b0fbe50ed43e io.containerd.runtime.v1.linux
14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841 sha256:da86e6ba6ca197bf6bc5e9d900febd906b133eaa4750e6bed647b0fbe50ed43e io.containerd.runtime.v1.linux
$ sudo /usr/local/sbin/runc --root=/var/run/containerd/runc/k8s.io list | grep 14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841
14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841 187411 running /run/containerd/io.containerd.runtime.v1.linux/k8s.io/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841 2018-12-03T03:28:42.182943096Z root
$ sudo /usr/local/sbin/runc --root=/var/run/containerd/runc/k8s.io state 14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841
{
"ociVersion": "1.0.0",
"id": "14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841",
"pid": 187411,
"status": "running",
"bundle": "/run/containerd/io.containerd.runtime.v1.linux/k8s.io/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841",
"rootfs": "/run/containerd/io.containerd.runtime.v1.linux/k8s.io/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/rootfs",
"created": "2018-12-03T03:28:42.182943096Z",
"annotations": {
"io.kubernetes.cri.container-type": "sandbox",
"io.kubernetes.cri.sandbox-id": "14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841"
},
"owner": ""
$ find /sys/fs/cgroup | grep 14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841
/sys/fs/cgroup/pids/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841
/sys/fs/cgroup/pids/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/pids.current
/sys/fs/cgroup/pids/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/cgroup.clone_children
/sys/fs/cgroup/pids/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/pids.max
/sys/fs/cgroup/pids/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/pids.events
/sys/fs/cgroup/pids/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/tasks
/sys/fs/cgroup/pids/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/notify_on_release
/sys/fs/cgroup/pids/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/cgroup.procs
/sys/fs/cgroup/hugetlb/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841
/sys/fs/cgroup/hugetlb/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/hugetlb.2MB.failcnt
/sys/fs/cgroup/hugetlb/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/cgroup.clone_children
/sys/fs/cgroup/hugetlb/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/hugetlb.2MB.max_usage_in_bytes
/sys/fs/cgroup/hugetlb/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/hugetlb.2MB.usage_in_bytes
/sys/fs/cgroup/hugetlb/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/tasks
/sys/fs/cgroup/hugetlb/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/notify_on_release
/sys/fs/cgroup/hugetlb/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/cgroup.procs
/sys/fs/cgroup/hugetlb/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/hugetlb.2MB.limit_in_bytes
/sys/fs/cgroup/blkio/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841
/sys/fs/cgroup/blkio/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/blkio.sectors
/sys/fs/cgroup/blkio/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/blkio.io_serviced
/sys/fs/cgroup/blkio/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/blkio.time_recursive
/sys/fs/cgroup/blkio/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/blkio.throttle.read_bps_device
/sys/fs/cgroup/blkio/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/blkio.throttle.write_bps_device
/sys/fs/cgroup/blkio/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/blkio.weight_device
/sys/fs/cgroup/blkio/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/blkio.io_queued
/sys/fs/cgroup/blkio/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/blkio.throttle.write_iops_device
/sys/fs/cgroup/blkio/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/blkio.io_merged
/sys/fs/cgroup/blkio/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/cgroup.clone_children
/sys/fs/cgroup/blkio/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/blkio.time
/sys/fs/cgroup/blkio/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/blkio.io_service_bytes_recursive
/sys/fs/cgroup/blkio/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/blkio.io_wait_time
/sys/fs/cgroup/blkio/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/blkio.sectors_recursive
/sys/fs/cgroup/blkio/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/blkio.io_service_time_recursive
/sys/fs/cgroup/blkio/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/blkio.leaf_weight
/sys/fs/cgroup/blkio/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/blkio.weight
/sys/fs/cgroup/blkio/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/blkio.io_serviced_recursive
/sys/fs/cgroup/blkio/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/blkio.io_service_bytes
/sys/fs/cgroup/blkio/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/blkio.io_queued_recursive
/sys/fs/cgroup/blkio/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/blkio.leaf_weight_device
/sys/fs/cgroup/blkio/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/tasks
/sys/fs/cgroup/blkio/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/blkio.throttle.io_service_bytes
/sys/fs/cgroup/blkio/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/blkio.throttle.io_serviced
/sys/fs/cgroup/blkio/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/notify_on_release
/sys/fs/cgroup/blkio/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/blkio.reset_stats
/sys/fs/cgroup/blkio/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/blkio.io_merged_recursive
/sys/fs/cgroup/blkio/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/blkio.io_service_time
/sys/fs/cgroup/blkio/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/cgroup.procs
/sys/fs/cgroup/blkio/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/blkio.io_wait_time_recursive
/sys/fs/cgroup/blkio/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/blkio.throttle.read_iops_device
/sys/fs/cgroup/memory/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841
/sys/fs/cgroup/memory/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/memory.use_hierarchy
/sys/fs/cgroup/memory/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/memory.kmem.tcp.max_usage_in_bytes
/sys/fs/cgroup/memory/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/memory.kmem.slabinfo
/sys/fs/cgroup/memory/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/memory.kmem.tcp.usage_in_bytes
/sys/fs/cgroup/memory/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/memory.kmem.failcnt
/sys/fs/cgroup/memory/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/cgroup.clone_children
/sys/fs/cgroup/memory/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/memory.force_empty
/sys/fs/cgroup/memory/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/memory.max_usage_in_bytes
/sys/fs/cgroup/memory/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/cgroup.event_control
/sys/fs/cgroup/memory/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/memory.swappiness
/sys/fs/cgroup/memory/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/memory.limit_in_bytes
/sys/fs/cgroup/memory/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/memory.kmem.usage_in_bytes
/sys/fs/cgroup/memory/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/memory.pressure_level
/sys/fs/cgroup/memory/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/memory.kmem.max_usage_in_bytes
/sys/fs/cgroup/memory/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/memory.kmem.tcp.limit_in_bytes
/sys/fs/cgroup/memory/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/memory.stat
/sys/fs/cgroup/memory/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/tasks
/sys/fs/cgroup/memory/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/notify_on_release
/sys/fs/cgroup/memory/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/memory.numa_stat
/sys/fs/cgroup/memory/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/memory.kmem.tcp.failcnt
/sys/fs/cgroup/memory/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/memory.oom_control
/sys/fs/cgroup/memory/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/memory.kmem.limit_in_bytes
/sys/fs/cgroup/memory/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/cgroup.procs
/sys/fs/cgroup/memory/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/memory.soft_limit_in_bytes
/sys/fs/cgroup/memory/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/memory.failcnt
/sys/fs/cgroup/memory/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/memory.usage_in_bytes
/sys/fs/cgroup/memory/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/memory.move_charge_at_immigrate
/sys/fs/cgroup/freezer/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841
/sys/fs/cgroup/freezer/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/freezer.parent_freezing
/sys/fs/cgroup/freezer/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/cgroup.clone_children
/sys/fs/cgroup/freezer/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/freezer.state
/sys/fs/cgroup/freezer/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/tasks
/sys/fs/cgroup/freezer/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/notify_on_release
/sys/fs/cgroup/freezer/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/cgroup.procs
/sys/fs/cgroup/freezer/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/freezer.self_freezing
/sys/fs/cgroup/cpuset/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841
/sys/fs/cgroup/cpuset/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/cpuset.mems
/sys/fs/cgroup/cpuset/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/cpuset.sched_relax_domain_level
/sys/fs/cgroup/cpuset/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/cpuset.mem_exclusive
/sys/fs/cgroup/cpuset/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/cpuset.memory_pressure
/sys/fs/cgroup/cpuset/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/cpuset.cpus
/sys/fs/cgroup/cpuset/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/cpuset.mem_hardwall
/sys/fs/cgroup/cpuset/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/cpuset.memory_migrate
/sys/fs/cgroup/cpuset/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/cgroup.clone_children
/sys/fs/cgroup/cpuset/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/cpuset.memory_spread_page
/sys/fs/cgroup/cpuset/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/cpuset.sched_load_balance
/sys/fs/cgroup/cpuset/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/cpuset.cpu_exclusive
/sys/fs/cgroup/cpuset/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/cpuset.effective_mems
/sys/fs/cgroup/cpuset/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/cpuset.effective_cpus
/sys/fs/cgroup/cpuset/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/tasks
/sys/fs/cgroup/cpuset/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/notify_on_release
/sys/fs/cgroup/cpuset/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/cpuset.memory_spread_slab
/sys/fs/cgroup/cpuset/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/cgroup.procs
/sys/fs/cgroup/devices/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841
/sys/fs/cgroup/devices/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/devices.list
/sys/fs/cgroup/devices/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/cgroup.clone_children
/sys/fs/cgroup/devices/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/devices.allow
/sys/fs/cgroup/devices/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/tasks
/sys/fs/cgroup/devices/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/notify_on_release
/sys/fs/cgroup/devices/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/cgroup.procs
/sys/fs/cgroup/devices/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/devices.deny
/sys/fs/cgroup/perf_event/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841
/sys/fs/cgroup/perf_event/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/cgroup.clone_children
/sys/fs/cgroup/perf_event/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/tasks
/sys/fs/cgroup/perf_event/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/notify_on_release
/sys/fs/cgroup/perf_event/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/cgroup.procs
/sys/fs/cgroup/cpu,cpuacct/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841
/sys/fs/cgroup/cpu,cpuacct/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/cpu.stat
/sys/fs/cgroup/cpu,cpuacct/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/cpuacct.usage
/sys/fs/cgroup/cpu,cpuacct/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/cpuacct.stat
/sys/fs/cgroup/cpu,cpuacct/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/cgroup.clone_children
/sys/fs/cgroup/cpu,cpuacct/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/cpuacct.usage_percpu_sys
/sys/fs/cgroup/cpu,cpuacct/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/cpuacct.usage_user
/sys/fs/cgroup/cpu,cpuacct/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/cpuacct.usage_percpu_user
/sys/fs/cgroup/cpu,cpuacct/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/cpuacct.usage_percpu
/sys/fs/cgroup/cpu,cpuacct/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/cpu.cfs_period_us
/sys/fs/cgroup/cpu,cpuacct/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/cpu.cfs_quota_us
/sys/fs/cgroup/cpu,cpuacct/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/tasks
/sys/fs/cgroup/cpu,cpuacct/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/notify_on_release
/sys/fs/cgroup/cpu,cpuacct/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/cpu.shares
/sys/fs/cgroup/cpu,cpuacct/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/cgroup.procs
/sys/fs/cgroup/cpu,cpuacct/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/cpuacct.usage_sys
/sys/fs/cgroup/cpu,cpuacct/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/cpuacct.usage_all
/sys/fs/cgroup/net_cls,net_prio/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841
/sys/fs/cgroup/net_cls,net_prio/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/cgroup.clone_children
/sys/fs/cgroup/net_cls,net_prio/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/net_prio.ifpriomap
/sys/fs/cgroup/net_cls,net_prio/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/net_cls.classid
/sys/fs/cgroup/net_cls,net_prio/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/tasks
/sys/fs/cgroup/net_cls,net_prio/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/notify_on_release
/sys/fs/cgroup/net_cls,net_prio/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/cgroup.procs
/sys/fs/cgroup/net_cls,net_prio/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/net_prio.prioidx
/sys/fs/cgroup/systemd/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841
/sys/fs/cgroup/systemd/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/cgroup.clone_children
/sys/fs/cgroup/systemd/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/tasks
/sys/fs/cgroup/systemd/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/notify_on_release
/sys/fs/cgroup/systemd/kubepods/pod6a6f0656-d640-11e8-a966-00505691e4af/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841/cgroup.procs
$ find /sys/fs/cgroup | grep 14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841 | grep "cgroup.procs" | xargs cat | uniq
187411
$ ps -aef | grep 187411
firesco+ 26482 230337 0 23:30 pts/1 00:00:00 grep --color=auto 187411
root 187411 187386 0 2018 ? 00:00:00 /pause
$ ps -aef | grep 187386
firesco+ 26345 230337 0 23:29 pts/1 00:00:00 grep --color=auto 187386
root 187386 1 0 2018 ? 00:01:21 containerd-shim -namespace k8s.io -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/k8s.io/14a2191736e10e8023b36f851b47ea758c227d1feef40ce94f003ef08f87f841 -address /run/containerd/containerd.sock -containerd-binary /usr/local/bin/containerd
root 187411 187386 0 2018 ? 00:00:00 /pause
Looks like there's no container in this pod, just the pause. How do I find out what the pod name is?
@steven-sheehy It seems that your pause image is sha256:da86e6ba6ca197bf6bc5e9d900febd906b133eaa4750e6bed647b0fbe50ed43e
, you may want to grep -v sha256:da86e6ba6ca197bf6bc5e9d900febd906b133eaa4750e6bed647b0fbe50ed43e
instead of grep -v pause
.
Your current list includes the sandbox (pause container), that is why you see the /pause
.
Let's try this again:
$ sudo ctr -n=k8s.io containers ls | grep -v pause | grep -v "sha256:da86e6ba6ca197bf6bc5e9d900febd906b133eaa4750e6bed647b0fbe50ed43e" | head -n 5
CONTAINER IMAGE RUNTIME
013d12bc9156460a60795ef4d571dba4c96a57e3ae2f3b76beec57cacd604727 registry.gitlab.com/firescope/stratis/pmacct/edge-pmacct:4.0.0-rc1 io.containerd.runtime.v1.linux
0f65efaa107732926ac247073b21ea8f012666d2f312d5726ddce96e8b4ea126 sha256:097dd2748285f27a6049caeb404bf5cbeaed8ce5c5fb71f734cf6f125abc9fe6 io.containerd.runtime.v1.linux
290fb9bbf983bee091c3bf50676df4648319e347c38c9dcc533137e459be35b9 registry.gitlab.com/firescope/stratis/edge/edge-inventory:4.0.0-rc9 io.containerd.runtime.v1.linux
30f2f8fd279d09abf726f7eacc0d180f43db06c6fc80680ca70e9777e564b8e1 sha256:a5103f96993a0024428cd52663b85081163e66d8f24e7ac26a18555f432006b2 io.containerd.runtime.v1.linux
$ sudo /usr/local/sbin/runc --root=/var/run/containerd/runc/k8s.io list | grep 30f2f8fd279d09abf726f7eacc0d180f43db06c6fc80680ca70e9777e564b8e1
30f2f8fd279d09abf726f7eacc0d180f43db06c6fc80680ca70e9777e564b8e1 0 stopped /run/containerd/io.containerd.runtime.v1.linux/k8s.io/30f2f8fd279d09abf726f7eacc0d180f43db06c6fc80680ca70e9777e564b8e1 2018-12-03T03:28:30.827234075Z root
$ sudo /usr/local/sbin/runc --root=/var/run/containerd/runc/k8s.io state 30f2f8fd279d09abf726f7eacc0d180f43db06c6fc80680ca70e9777e564b8e1
{
"ociVersion": "1.0.0",
"id": "30f2f8fd279d09abf726f7eacc0d180f43db06c6fc80680ca70e9777e564b8e1",
"pid": 0,
"status": "stopped",
"bundle": "/run/containerd/io.containerd.runtime.v1.linux/k8s.io/30f2f8fd279d09abf726f7eacc0d180f43db06c6fc80680ca70e9777e564b8e1",
"rootfs": "/run/containerd/io.containerd.runtime.v1.linux/k8s.io/30f2f8fd279d09abf726f7eacc0d180f43db06c6fc80680ca70e9777e564b8e1/rootfs",
"created": "2018-12-03T03:28:30.827234075Z",
"annotations": {
"io.kubernetes.cri.container-type": "container",
"io.kubernetes.cri.sandbox-id": "49539e8ac5a735c8946d683198957962ec0e186a9774541d8bb7c3fae49342ef"
},
"owner": ""
}
find /sys/fs/cgroup | grep 30f2f8fd279d09abf726f7eacc0d180f43db06c6fc80680ca70e9777e564b8e1 | grep cgroup.procs | xargs cat | sort -u
186522
186685
$ ps -aef | grep 186522
root 186522 186398 0 2018 ? 01:03:56 /home/weave/weaver --port=6783 --datapath=datapath --name=d2:44:84:04:ea:42 --host-root=/host --http-addr=127.0.0.1:6784 --metrics-addr=0.0.0.0:6782 --docker-api= --no-dns --db-prefix=/weavedb/weave-net --ipalloc-range=10.214.0.0/16 --nickname=qaedge1fce1-2 --ipalloc-init consensus=3 --conn-limit=100 --expect-npc 10.0.22.153 10.0.22.154 10.0.22.152
$ ps -aef | grep 186398
root 186398 1 0 2018 ? 00:01:20 containerd-shim -namespace k8s.io -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/k8s.io/30f2f8fd279d09abf726f7eacc0d180f43db06c6fc80680ca70e9777e564b8e1 -address /run/containerd/containerd.sock -containerd-binary /usr/local/bin/containerd
root 186522 186398 0 2018 ? 01:03:56 /home/weave/weaver --port=6783 --datapath=datapath --name=d2:44:84:04:ea:42 --host-root=/host --http-addr=127.0.0.1:6784 --metrics-addr=0.0.0.0:6782 --docker-api= --no-dns --db-prefix=/weavedb/weave-net --ipalloc-range=10.214.0.0/16 --nickname=qaedge1fce1-2 --ipalloc-init consensus=3 --conn-limit=100 --expect-npc 10.0.22.153 10.0.22.154 10.0.22.152
root 186685 186398 0 2018 ? 00:16:35 /home/weave/kube-utils -run-reclaim-daemon -node-name=qaedge1fce1-2 -peer-name=d2:44:84:04:ea:42 -log-level=debug
$ pstree 186398
containerd-shim─┬─kube-utils───21*[{kube-utils}]
├─weaver───32*[{weaver}]
└─9*[{containerd-shim}]
So looks like it is Weave. Which makes sense since that is the last thing in the containerd logs
@steven-sheehy Yeah, this is the problem.
You see that runc report the container is stopped (because the init process is dead). However, there are still processes running holding the io, which blocks the container delete.
Did you upgrade containerd from some older version to 1.2.1? And the weave container was running before the update?
If that is the case, the running containerd-shim instance will be an old containerd-shim which doesn't include the fix https://github.com/containerd/containerd/pull/2854. And containerd won't be able to stop the container because of a known bug.
If this is confirmed, you may want to manually runc kill --all
that container. We'll try to maintain backward compatibility and support in-place upgrade. However, 1) this is a bug in the old version, which is not a backward compatibility issue; 2) it is always recommended to drain your node before upgrading critical node components like kubelet, containerd.
And we may want to document this in the next release note as a known issue that requires action.
@Random-Liu That's probably the case. The weave container has a create date of 2018-12-03T03:28:30.712112913Z according to ctr and that PR was merged on 12/4.
When I upgraded containerd, I did drain the node using kubectl drain but it doesn't remove daemonsets like weave. If kubectl is not enough, should I be using crictl or ctr to drain? Do I need to stop kubelet since daemonsets always get rescheduled? I couldn't find any documentation on the proper procedure to upgrade containerd so I just did the above, but it sounds like it was not enough. Maybe this issue can be turned into a documentation issue where steps to upgrade between major, minor and bugfix releases are provided?
Thanks again for your help tracking down these issues.
When I upgraded containerd, I did drain the node using kubectl drain but it doesn't remove daemonsets like weave.
I see. This is bad. :( We are about to release 1.2.2. We should include some instruction there.
This only happens when: 1) You upgrade containerd to 1.2+; 2) You have a pod not drained from the node (e.g. a daemonset), and the pod: a) Has a container contains multiple processes; b) Some processes don't exit with the container init process; c) The pod is using pod pid namespace.
If this is the case, you need to use runc kill --all
to kill that container. :(
We'll simplify this a bit in the release note.
@Random-Liu Release notes to allow upgrades for this specific issue are good, but more generic upgrade instructions on https://containerd.io/docs/ or on github would be much appreciated as well. I'm still confused on exact commands, especially if you don't know what containers you need to runc kill.
@steven-sheehy Does your weave pod has shareProcessNamespace: true
in the pod spec?
No. For reference, this is the weave yaml we are applying.
OK. So hostPID
pod is also affected...
What containerd version did you upgrade from to 1.2.1?
To be honest, I'm not sure exactly. It was either 1.2.1-rc.0 or 1.2.0. We've only recently switched to containerd so those are the only versions we've ever used.
@steven-sheehy I see.
I fully understand the problem now. The bug introduced by https://github.com/containerd/containerd/commit/b5ccc66c2c814cfc21d58678c854272427814b59 is soooooo bad... It only exists in 1.2 versions before 1.2.1, because of the bug, you can never stop a multi-process host/pod pid pod. I think we'll have to recommend a node REBOOT for the 1.2.x --> 1.2.1 update...
And I'll add an integration test for this case to prevent future regression.
As for upgrade instructions, here are the steps I came up with when using Kubernetes and systemd. I'm not sure if these steps will suffice to avoid the deadlock when upgrading 1.2.0 to 1.2.1 or if a node reboot is required. But I think something like these should be added to an upgrade section of the documentation. Do these steps look correct?
kubectl drain $HOSTNAME --ignore-daemonsets --delete-local-data
systemctl stop kubelet.service
crictl stopp $(crictl pods -q)
systemctl stop containerd.service
# update containerd binary via package manager or manually
systemctl start containerd.service
systemctl start kubelet.service
@steven-sheehy Thanks for the detailed instructions!
We currently don't have a place for update instructions. Release note seems to be a best place for this. We do similar thing for Kubernetes release note as well. :)
This has been included in the 1.2.2 release note https://github.com/containerd/containerd/releases/tag/v1.2.2.
Let's close this issue for now.
After restarting containerd via
sudo systemctl restart containerd
, I have pods stuck in various state (Terminating,Init, Running 0/1) indefinitely. Also, containerd is running but printing no logs and seems to be deadlocked. crictl is not able to connect to socket.Seems deadlock is here in waiting for container IO to close?
containerd 1.2.1 kubernetes 1.11.6 Ubuntu 18.04
This is a different environment than my other issue #1014.