clearlinux / distribution

Placeholder repository to allow filing of general bugs/issues/etc against the Clear Linux OS for Intel Architecture linux distribution
520 stars 29 forks source link

Kubernetes container startup: no space left on device #2007

Open pohly opened 4 years ago

pohly commented 4 years ago

In the CI for PMEM-CSI, we bring up and tear down PMEM-CSI containers quite a lot. Recently we ran into this failure:

[2020-06-05T09:31:10.587Z] container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"rootfs_linux.go:58: mounting \\\"/sys\\\" to rootfs \\\"/var/lib/containers/storage/overlay/a12aa66fc67b2aff8ba5dc15537b03c6bedc64fa20ef79ebf4e69aa7e3b8d34d/merged\\\" at \\\"/var/lib/containers/storage/overlay/a12aa66fc67b2aff8ba5dc15537b03c6bedc64fa20ef79ebf4e69aa7e3b8d34d/merged/sys\\\" caused \\\"no space left on device\\\"\""

This affected all PMEM-CSI instances on all nodes:

$ kubectl get pods -o wide
NAME                    READY   STATUS              RESTARTS   AGE   IP            NODE                    NOMINATED NODE   READINESS GATES
pmem-csi-controller-0   2/2     Running             0          59s   10.244.0.17   pmem-csi-govm-master    <none>           <none>
pmem-csi-node-wf4wg     0/2     ContainerCreating   0          59s   <none>        pmem-csi-govm-worker1   <none>           <none>
pmem-csi-node-z9wzq     0/2     ContainerCreating   0          59s   <none>        pmem-csi-govm-worker3   <none>           <none>
pmem-csi-node-zc9cs     0/2     ContainerCreating   0          59s   <none>        pmem-csi-govm-worker2   <none>           <none>

But according to df, there is enough space:

clear@pmem-csi-govm-worker1~ $ df
Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/root      413468632 2881192 393792732   1% /
devtmpfs         1014560       0   1014560   0% /dev
tmpfs            1016552       0   1016552   0% /dev/shm
tmpfs            1016552    1032   1015520   1% /run
tmpfs            1016552       0   1016552   0% /sys/fs/cgroup
tmpfs            1016552       0   1016552   0% /tmp
tmpfs             203308       0    203308   0% /run/user/1000
clear@pmem-csi-govm-worker1~ $ df --inodes
Filesystem       Inodes IUsed    IFree IUse% Mounted on
/dev/root      23398400 51279 23347121    1% /
devtmpfs         253640   340   253300    1% /dev
tmpfs            254138     1   254137    1% /dev/shm
tmpfs            254138   597   253541    1% /run
tmpfs            254138    17   254121    1% /sys/fs/cgroup
tmpfs            254138    12   254126    1% /tmp
tmpfs            254138     5   254133    1% /run/user/1000

This was on Clear Linux 32690, Kubernetes 1.17, with crio as CRI.

Obviously this report doesn't have enough information to reproduce or fix the issue. I'm just reporting it so that we have a place to discuss it.

dotkrnl commented 3 years ago

I came into a similar error message and solve it by unmounting all subdirectories (seems like Docker failed to clean them up). I am not sure if this is the same issue but just posting for your information.