Open JakeBonek opened 5 years ago
Do you have swap active? Try to disable the swap!
Swap is disabled on the host.
This could be related to a bug in the RHEL/CentOS kernels where kernel-memory cgroups doesn't work properly; we included a workaround for this in later versions of docker to disable this feature; https://github.com/moby/moby/pull/38145 (backported to Docker 18.09 and up https://github.com/docker/engine/pull/121)
Note that Docker 18.06 reached EOL, and won't be updated with this fix, so I recommend updating to a current version.
I'm closing this issue because of the above, but feel free to continue the conversation
Hello. I'm facing this same problem in my environment and seems quite like a bug, because it ramdonly happens in a cluster with more than 350 containers. Is there a chance that this bug is present on this current versions?
# docker --version
Docker version 19.03.5, build 633a0ea
# docker version
Client: Docker Engine - Community
Version: 19.03.5
API version: 1.40
Go version: go1.12.12
Git commit: 633a0ea
Built: Wed Nov 13 07:25:41 2019
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 19.03.5
API version: 1.40 (minimum version 1.12)
Go version: go1.12.12
Git commit: 633a0ea
Built: Wed Nov 13 07:24:18 2019
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.2.10
GitCommit: b34a5c8af56e510852c35414db4c1f4fa6172339
runc:
Version: 1.0.0-rc8+dev
GitCommit: 3e425f80a8c931f88e6d94a8c831b9d5aa481657
docker-init:
Version: 0.18.0
GitCommit: fec3683
# containerd --version
containerd 1.2.10 b34a5c8af56e510852c35414db4c1f4fa6172339
# uname -r
3.10.0-1062.4.3.el7.x86_64
@thaJeztah
We are also seeing this issue in our cluster.
# docker run -it c7c39515eefe bash
docker: Error response from daemon: OCI runtime create failed: container_linux.go:344: starting container process caused "process_linux.go:275: applying cgroup configuration for process caused \"mkdir /sys/fs/cgroup/memory/docker/56ca1a748e94176c378682012a8ad1a6cab3b812dfb1f34e9da303d47d8f0e97: cannot allocate memory\"": unknown.
These are the software versions that we are on. Could you please advise?
# docker info
Containers: 29
Running: 19
Paused: 0
Stopped: 10
Images: 184
Server Version: 18.09.3
Storage Driver: overlay2
Backing Filesystem: xfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: e6b3f5632f50dbc4e9cb6288d911bf4f5e95b18e
runc version: 6635b4f0c6af3810594d2770f662f34ddc15b40d
init version: fec3683
Security Options:
seccomp
Profile: default
Kernel Version: 3.10.0-957.1.3.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 16
Total Memory: 503.8GiB
Name: hostname.here
ID: QG35:QFQQ:ZLOZ:BZEC:SKL5:CDJ2:74VV:WFDO:5PCY:MJEN:VMQB:DNA5
Docker Root Dir: /data/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine
# uname -r
3.10.0-957.1.3.el7.x86_64
# containerd --version
containerd github.com/containerd/containerd 1.2.4 e6b3f5632f50dbc4e9cb6288d911bf4f5e95b18e
Thanks
@thaJeztah I'm facing the exact same issue in my environment.
# uname -a
Linux monitor49 3.10.0-957.5.1.el7.x86_64 #1 SMP Fri Feb 1 14:54:57 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
# docker info
Containers: 14
Running: 13
Paused: 0
Stopped: 1
Images: 54
Server Version: 18.06.0-ce
Storage Driver: overlay2
Backing Filesystem: xfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: d64c661f1d51c48782c9cec8fda7604785f93587
runc version: 69663f0bd4b60df09991c08812a60108003fa340
init version: fec3683
Security Options:
seccomp
Profile: default
Kernel Version: 3.10.0-957.5.1.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 32
Total Memory: 125.7GiB
Name: monitor49
ID: 5T2R:BZFE:TQD3:LXSE:GUC7:5WNG:O5WY:CLJ2:FT62:J7ZX:EYB2:H67D
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
nexus.5f.cloud:8890
nexus.5f.cloud:8891
nexus.cloud:8890
nexus.cloud:8891
127.0.0.0/8
Live Restore Enabled: true
# docker-containerd --version
containerd github.com/containerd/containerd v1.1.1 d64c661f1d51c48782c9cec8fda7604785f93587
same here, RedHat 7.7. kernel 3.10.0-1062.4.1.el7.x86_64 with docker version 19.03.5, build 633a0ea @thaJeztah can you reopen the issue ?
This is the continuity of this kernel bug, at least on RH: https://bugzilla.redhat.com/show_bug.cgi?id=1507149
repros on CentOS 7 kernel Linux 3.10.0-1062.4.3.el7.x86_64 Docker version 19.03.5, build 633a0ea
Same Issue here
Centos 7 Kernel: Linux linux.hostname.placeholder.it 3.10.0-1062.4.3.el7.x86_64 #1 SMP Wed Nov 13 23:58:53 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Docker version 19.03.5, build 633a0ea
Provisioned via Nomad
Log:
Dec 12 12:00:46 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:46.033619039+01:00" level=error msg="9c9e6096b6b2855934d9a1a06250969d44466145f9a392f86b0515f34630288b cleanup: failed to delete container from containerd: no such container"
Dec 12 12:00:46 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:46.033708452+01:00" level=error msg="Handler for POST /containers/9c9e6096b6b2855934d9a1a06250969d44466145f9a392f86b0515f34630288b/start returned error: OCI runtime create failed: container_linux.go:346: starting container process caused \"process_linux.go:297: applying cgroup configuration for process caused \\\"mkdir /sys/fs/cgroup/memory/docker/9c9e6096b6b2855934d9a1a06250969d44466145f9a392f86b0515f34630288b: cannot allocate memory\\\"\": unknown"
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: docker0: port 6(veth810fe6d) entered blocking state
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: docker0: port 6(veth810fe6d) entered disabled state
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: device veth810fe6d entered promiscuous mode
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: IPv6: ADDRCONF(NETDEV_UP): veth810fe6d: link is not ready
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: docker0: port 6(veth810fe6d) entered blocking state
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: docker0: port 6(veth810fe6d) entered forwarding state
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: docker0: port 7(vethf942213) entered blocking state
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: docker0: port 7(vethf942213) entered disabled state
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: device vethf942213 entered promiscuous mode
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: IPv6: ADDRCONF(NETDEV_UP): vethf942213: link is not ready
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: docker0: port 7(vethf942213) entered blocking state
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: docker0: port 7(vethf942213) entered forwarding state
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: docker0: port 5(vethd70c60e) entered disabled state
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: docker0: port 6(veth810fe6d) entered disabled state
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: docker0: port 7(vethf942213) entered disabled state
Dec 12 12:00:46 linux.hostname.placeholder.it containerd[1753]: time="2019-12-12T12:00:46.164338118+01:00" level=info msg="shim containerd-shim started" address="/containerd-shim/moby/114d4d0d12a56762e6a5b3b3ba5c9490285203f264e1b855c999eead5b9e891b/shim.sock" debug=false pid=106646
Dec 12 12:00:46 linux.hostname.placeholder.it containerd[1753]: time="2019-12-12T12:00:46.165050163+01:00" level=info msg="shim containerd-shim started" address="/containerd-shim/moby/7ab9f53ec0d561800e6b5b61e98f6be75777f154966a498eb4947d5a73723914/shim.sock" debug=false pid=106647
Dec 12 12:00:46 linux.hostname.placeholder.it containerd[1753]: time="2019-12-12T12:00:46.170620429+01:00" level=info msg="shim containerd-shim started" address="/containerd-shim/moby/b27ad5a77e4469e1025d4311cf4a735e630c33907209cf31f472e8f909c7caf1/shim.sock" debug=false pid=106666
Dec 12 12:00:46 linux.hostname.placeholder.it containerd[1753]: time="2019-12-12T12:00:46.267713777+01:00" level=info msg="shim reaped" id=b27ad5a77e4469e1025d4311cf4a735e630c33907209cf31f472e8f909c7caf1
Dec 12 12:00:46 linux.hostname.placeholder.it containerd[1753]: time="2019-12-12T12:00:46.275364215+01:00" level=info msg="shim reaped" id=114d4d0d12a56762e6a5b3b3ba5c9490285203f264e1b855c999eead5b9e891b
Dec 12 12:00:46 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:46.277650799+01:00" level=error msg="stream copy error: reading from a closed fifo"
Dec 12 12:00:46 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:46.277696613+01:00" level=error msg="stream copy error: reading from a closed fifo"
Dec 12 12:00:46 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:46.285452523+01:00" level=error msg="stream copy error: reading from a closed fifo"
Dec 12 12:00:46 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:46.285484175+01:00" level=error msg="stream copy error: reading from a closed fifo"
Dec 12 12:00:46 linux.hostname.placeholder.it containerd[1753]: time="2019-12-12T12:00:46.287996609+01:00" level=info msg="shim reaped" id=7ab9f53ec0d561800e6b5b61e98f6be75777f154966a498eb4947d5a73723914
Dec 12 12:00:46 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:46.297959225+01:00" level=error msg="stream copy error: reading from a closed fifo"
Dec 12 12:00:46 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:46.297968748+01:00" level=error msg="stream copy error: reading from a closed fifo"
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: docker0: port 7(vethf942213) entered disabled state
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: device vethf942213 left promiscuous mode
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: docker0: port 7(vethf942213) entered disabled state
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: docker0: port 5(vethd70c60e) entered disabled state
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: device vethd70c60e left promiscuous mode
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: docker0: port 5(vethd70c60e) entered disabled state
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: docker0: port 6(veth810fe6d) entered disabled state
Dec 12 12:00:46 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:46.465478486+01:00" level=warning msg="b27ad5a77e4469e1025d4311cf4a735e630c33907209cf31f472e8f909c7caf1 cleanup: failed to unmount IPC: umount /var/lib/docker/containers/b27ad5a77e4469e1025d4311cf4a735e630c33907209cf31f472e8f909c7caf1/mounts/shm, flags: 0x2: no such file or directory"
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: device veth810fe6d left promiscuous mode
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: docker0: port 6(veth810fe6d) entered disabled state
Dec 12 12:00:46 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:46.473303028+01:00" level=warning msg="114d4d0d12a56762e6a5b3b3ba5c9490285203f264e1b855c999eead5b9e891b cleanup: failed to unmount IPC: umount /var/lib/docker/containers/114d4d0d12a56762e6a5b3b3ba5c9490285203f264e1b855c999eead5b9e891b/mounts/shm, flags: 0x2: no such file or directory"
Dec 12 12:00:46 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:46.521090337+01:00" level=warning msg="7ab9f53ec0d561800e6b5b61e98f6be75777f154966a498eb4947d5a73723914 cleanup: failed to unmount IPC: umount /var/lib/docker/containers/7ab9f53ec0d561800e6b5b61e98f6be75777f154966a498eb4947d5a73723914/mounts/shm, flags: 0x2: no such file or directory"
Dec 12 12:00:46 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:46.578620238+01:00" level=error msg="114d4d0d12a56762e6a5b3b3ba5c9490285203f264e1b855c999eead5b9e891b cleanup: failed to delete container from containerd: no such container"
Dec 12 12:00:46 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:46.578710816+01:00" level=error msg="Handler for POST /containers/114d4d0d12a56762e6a5b3b3ba5c9490285203f264e1b855c999eead5b9e891b/start returned error: OCI runtime create failed: container_linux.go:346: starting container process caused \"process_linux.go:297: applying cgroup configuration for process caused \\\"mkdir /sys/fs/cgroup/memory/docker/114d4d0d12a56762e6a5b3b3ba5c9490285203f264e1b855c999eead5b9e891b: cannot allocate memory\\\"\": unknown"
Dec 12 12:00:46 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:46.581544749+01:00" level=error msg="b27ad5a77e4469e1025d4311cf4a735e630c33907209cf31f472e8f909c7caf1 cleanup: failed to delete container from containerd: no such container"
Dec 12 12:00:46 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:46.581584376+01:00" level=error msg="Handler for POST /containers/b27ad5a77e4469e1025d4311cf4a735e630c33907209cf31f472e8f909c7caf1/start returned error: OCI runtime create failed: container_linux.go:346: starting container process caused \"process_linux.go:297: applying cgroup configuration for process caused \\\"mkdir /sys/fs/cgroup/memory/docker/b27ad5a77e4469e1025d4311cf4a735e630c33907209cf31f472e8f909c7caf1: cannot allocate memory\\\"\": unknown"
Dec 12 12:00:46 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:46.610861406+01:00" level=error msg="7ab9f53ec0d561800e6b5b61e98f6be75777f154966a498eb4947d5a73723914 cleanup: failed to delete container from containerd: no such container"
Dec 12 12:00:46 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:46.610913300+01:00" level=error msg="Handler for POST /containers/7ab9f53ec0d561800e6b5b61e98f6be75777f154966a498eb4947d5a73723914/start returned error: OCI runtime create failed: container_linux.go:346: starting container process caused \"process_linux.go:297: applying cgroup configuration for process caused \\\"mkdir /sys/fs/cgroup/memory/docker/7ab9f53ec0d561800e6b5b61e98f6be75777f154966a498eb4947d5a73723914: cannot allocate memory\\\"\": unknown"
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: docker0: port 5(veth83d5462) entered blocking state
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: docker0: port 5(veth83d5462) entered disabled state
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: device veth83d5462 entered promiscuous mode
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: IPv6: ADDRCONF(NETDEV_UP): veth83d5462: link is not ready
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: docker0: port 5(veth83d5462) entered blocking state
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: docker0: port 5(veth83d5462) entered forwarding state
Dec 12 12:00:46 linux.hostname.placeholder.it containerd[1753]: time="2019-12-12T12:00:46.767810035+01:00" level=info msg="shim containerd-shim started" address="/containerd-shim/moby/09e1d8749a5d3abd187233dcf6555dbb13e3512d26e9ad53088e1c8c3cc33c22/shim.sock" debug=false pid=106740
Dec 12 12:00:46 linux.hostname.placeholder.it containerd[1753]: time="2019-12-12T12:00:46.897232357+01:00" level=info msg="shim reaped" id=09e1d8749a5d3abd187233dcf6555dbb13e3512d26e9ad53088e1c8c3cc33c22
Dec 12 12:00:46 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:46.908706574+01:00" level=error msg="stream copy error: reading from a closed fifo"
Dec 12 12:00:46 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:46.908878386+01:00" level=error msg="stream copy error: reading from a closed fifo"
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: docker0: port 5(veth83d5462) entered disabled state
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: device veth83d5462 left promiscuous mode
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: docker0: port 5(veth83d5462) entered disabled state
Dec 12 12:00:46 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:46.976899282+01:00" level=warning msg="09e1d8749a5d3abd187233dcf6555dbb13e3512d26e9ad53088e1c8c3cc33c22 cleanup: failed to unmount IPC: umount /var/lib/docker/containers/09e1d8749a5d3abd187233dcf6555dbb13e3512d26e9ad53088e1c8c3cc33c22/mounts/shm, flags: 0x2: no such file or directory"
Dec 12 12:00:47 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:47.058601763+01:00" level=error msg="09e1d8749a5d3abd187233dcf6555dbb13e3512d26e9ad53088e1c8c3cc33c22 cleanup: failed to delete container from containerd: no such container"
Dec 12 12:00:47 linux.hostname.placeholder.it nomad[1733]: 2019-12-12T12:00:47.058+0100 [ERROR] client.driver_mgr.docker: failed to start container: driver=docker container_id=09e1d8749a5d3abd187233dcf6555dbb13e3512d26e9ad53088e1c8c3cc33c22 error="API error (500): OCI runtime create failed: container_linux.go:346: starting container process caused "process_linux.go:297: applying cgroup configuration for process caused \"mkdir /sys/fs/cgroup/memory/docker/09e1d8749a5d3abd187233dcf6555dbb13e3512d26e9ad53088e1c8c3cc33c22: cannot allocate memory\"": unknown"
Dec 12 12:00:47 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:47.058699552+01:00" level=error msg="Handler for POST /containers/09e1d8749a5d3abd187233dcf6555dbb13e3512d26e9ad53088e1c8c3cc33c22/start returned error: OCI runtime create failed: container_linux.go:346: starting container process caused \"process_linux.go:297: applying cgroup configuration for process caused \\\"mkdir /sys/fs/cgroup/memory/docker/09e1d8749a5d3abd187233dcf6555dbb13e3512d26e9ad53088e1c8c3cc33c22: cannot allocate memory\\\"\": unknown"
Dec 12 12:00:47 linux.hostname.placeholder.it nomad[1733]: 2019-12-12T12:00:47.179+0100 [ERROR] client.alloc_runner.task_runner: running driver failed: alloc_id=22c7c014-c45f-a3ec-1b72-e441f5efb57e task=core-drones-event-handler error="Failed to start container 09e1d8749a5d3abd187233dcf6555dbb13e3512d26e9ad53088e1c8c3cc33c22: API error (500): OCI runtime create failed: container_linux.go:346: starting container process caused "process_linux.go:297: applying cgroup configuration for process caused \"mkdir /sys/fs/cgroup/memory/docker/09e1d8749a5d3abd187233dcf6555dbb13e3512d26e9ad53088e1c8c3cc33c22: cannot allocate memory\"": unknown"
Dec 12 12:00:47 linux.hostname.placeholder.it nomad[1733]: 2019-12-12T12:00:47.179+0100 [INFO ] client.alloc_runner.task_runner: restarting task: alloc_id=22c7c014-c45f-a3ec-1b72-e441f5efb57e task=core-drones-event-handler reason="Restart within policy" delay=17.065586128s
Dec 12 12:00:47 linux.hostname.placeholder.it kernel: docker0: port 5(veth90db994) entered blocking state
Dec 12 12:00:47 linux.hostname.placeholder.it kernel: docker0: port 5(veth90db994) entered disabled state
Dec 12 12:00:47 linux.hostname.placeholder.it kernel: device veth90db994 entered promiscuous mode
Dec 12 12:00:47 linux.hostname.placeholder.it kernel: IPv6: ADDRCONF(NETDEV_UP): veth90db994: link is not ready
Dec 12 12:00:47 linux.hostname.placeholder.it kernel: docker0: port 5(veth90db994) entered blocking state
Dec 12 12:00:47 linux.hostname.placeholder.it kernel: docker0: port 5(veth90db994) entered forwarding state
Dec 12 12:00:47 linux.hostname.placeholder.it containerd[1753]: time="2019-12-12T12:00:47.229192684+01:00" level=info msg="shim containerd-shim started" address="/containerd-shim/moby/adb62738eb9315a239aac02d981ca0d5afbb7d66d99a977b6d9db134036df94d/shim.sock" debug=false pid=106774
Dec 12 12:00:47 linux.hostname.placeholder.it containerd[1753]: time="2019-12-12T12:00:47.348654188+01:00" level=info msg="shim reaped" id=adb62738eb9315a239aac02d981ca0d5afbb7d66d99a977b6d9db134036df94d
Dec 12 12:00:47 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:47.358609610+01:00" level=error msg="stream copy error: reading from a closed fifo"
Dec 12 12:00:47 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:47.358609645+01:00" level=error msg="stream copy error: reading from a closed fifo"
Dec 12 12:00:47 linux.hostname.placeholder.it kernel: docker0: port 5(veth90db994) entered disabled state
Dec 12 12:00:47 linux.hostname.placeholder.it kernel: device veth90db994 left promiscuous mode
Dec 12 12:00:47 linux.hostname.placeholder.it kernel: docker0: port 5(veth90db994) entered disabled state
Dec 12 12:00:47 linux.hostname.placeholder.it nomad[1733]: 2019-12-12T12:00:47.458+0100 [INFO ] client.driver_mgr.docker: created container: driver=docker container_id=014a76bce64d20765a5bf2dc5b32fdb990e53a80a2fe3ea26343d88a62d41321
Dec 12 12:00:47 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:47.461382343+01:00" level=warning msg="adb62738eb9315a239aac02d981ca0d5afbb7d66d99a977b6d9db134036df94d cleanup: failed to unmount IPC: umount /var/lib/docker/containers/adb62738eb9315a239aac02d981ca0d5afbb7d66d99a977b6d9db134036df94d/mounts/shm, flags: 0x2: no such file or directory"
Dec 12 12:00:47 linux.hostname.placeholder.it kernel: docker0: port 5(vethc153a0c) entered blocking state
Dec 12 12:00:47 linux.hostname.placeholder.it kernel: docker0: port 5(vethc153a0c) entered disabled state
Dec 12 12:00:47 linux.hostname.placeholder.it kernel: device vethc153a0c entered promiscuous mode
Dec 12 12:00:47 linux.hostname.placeholder.it kernel: IPv6: ADDRCONF(NETDEV_UP): vethc153a0c: link is not ready
Dec 12 12:00:47 linux.hostname.placeholder.it kernel: docker0: port 5(vethc153a0c) entered blocking state
Dec 12 12:00:47 linux.hostname.placeholder.it kernel: docker0: port 5(vethc153a0c) entered forwarding state
Dec 12 12:00:47 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:47.542195860+01:00" level=error msg="adb62738eb9315a239aac02d981ca0d5afbb7d66d99a977b6d9db134036df94d cleanup: failed to delete container from containerd: no such container"
Dec 12 12:00:47 linux.hostname.placeholder.it nomad[1733]: 2019-12-12T12:00:47.542+0100 [ERROR] client.driver_mgr.docker: failed to start container: driver=docker container_id=adb62738eb9315a239aac02d981ca0d5afbb7d66d99a977b6d9db134036df94d error="API error (500): OCI runtime create failed: container_linux.go:346: starting container process caused "process_linux.go:297: applying cgroup configuration for process caused \"mkdir /sys/fs/cgroup/memory/docker/adb62738eb9315a239aac02d981ca0d5afbb7d66d99a977b6d9db134036df94d: cannot allocate memory\"": unknown"
Dec 12 12:00:47 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:47.542233359+01:00" level=error msg="Handler for POST /containers/adb62738eb9315a239aac02d981ca0d5afbb7d66d99a977b6d9db134036df94d/start returned error: OCI runtime create failed: container_linux.go:346: starting container process caused \"process_linux.go:297: applying cgroup configuration for process caused \\\"mkdir /sys/fs/cgroup/memory/docker/adb62738eb9315a239aac02d981ca0d5afbb7d66d99a977b6d9db134036df94d: cannot allocate memory\\\"\": unknown"
Dec 12 12:00:47 linux.hostname.placeholder.it containerd[1753]: time="2019-12-12T12:00:47.551852060+01:00" level=info msg="shim containerd-shim started" address="/containerd-shim/moby/014a76bce64d20765a5bf2dc5b32fdb990e53a80a2fe3ea26343d88a62d41321/shim.sock" debug=false pid=106820
Dec 12 12:00:47 linux.hostname.placeholder.it nomad[1733]: 2019-12-12T12:00:47.658+0100 [ERROR] client.alloc_runner.task_runner: running driver failed: alloc_id=0f03d341-2db7-ef1f-ac3d-b46729121047 task=core-drones-sensor error="Failed to start container adb62738eb9315a239aac02d981ca0d5afbb7d66d99a977b6d9db134036df94d: API error (500): OCI runtime create failed: container_linux.go:346: starting container process caused "process_linux.go:297: applying cgroup configuration for process caused \"mkdir /sys/fs/cgroup/memory/docker/adb62738eb9315a239aac02d981ca0d5afbb7d66d99a977b6d9db134036df94d: cannot allocate memory\"": unknown"
Dec 12 12:00:47 linux.hostname.placeholder.it nomad[1733]: 2019-12-12T12:00:47.658+0100 [INFO ] client.alloc_runner.task_runner: restarting task: alloc_id=0f03d341-2db7-ef1f-ac3d-b46729121047 task=core-drones-sensor reason="Restart within policy" delay=15.333442815s
Dec 12 12:00:47 linux.hostname.placeholder.it containerd[1753]: time="2019-12-12T12:00:47.685596667+01:00" level=info msg="shim reaped" id=014a76bce64d20765a5bf2dc5b32fdb990e53a80a2fe3ea26343d88a62d41321
Dec 12 12:00:47 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:47.695890735+01:00" level=error msg="stream copy error: reading from a closed fifo"
Dec 12 12:00:47 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:47.695939782+01:00" level=error msg="stream copy error: reading from a closed fifo"
Dec 12 12:00:47 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:47.757933654+01:00" level=warning msg="Error getting v2 registry: Get https://registry:5000/v2/: http: server gave HTTP response to HTTPS client"
Dec 12 12:00:47 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:47.758010520+01:00" level=info msg="Attempting next endpoint for pull after error: Get https://registry:5000/v2/: http: server gave HTTP response to HTTPS client"
Dec 12 12:00:47 linux.hostname.placeholder.it kernel: docker0: port 5(vethc153a0c) entered disabled state
Dec 12 12:00:47 linux.hostname.placeholder.it kernel: device vethc153a0c left promiscuous mode
Dec 12 12:00:47 linux.hostname.placeholder.it kernel: docker0: port 5(vethc153a0c) entered disabled state
Dec 12 12:00:47 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:47.819093578+01:00" level=warning msg="014a76bce64d20765a5bf2dc5b32fdb990e53a80a2fe3ea26343d88a62d41321 cleanup: failed to unmount IPC: umount /var/lib/docker/containers/014a76bce64d20765a5bf2dc5b32fdb990e53a80a2fe3ea26343d88a62d41321/mounts/shm, flags: 0x2: no such file or directory"
Dec 12 12:00:47 linux.hostname.placeholder.it nomad[1733]: 2019-12-12T12:00:47.907+0100 [INFO ] client.driver_mgr.docker: created container: driver=docker container_id=44c9cef26857d695932d6d66ea218ea2a8c081732b5b3305fea7e540a65c2331
Dec 12 12:00:47 linux.hostname.placeholder.it kernel: docker0: port 5(veth70dc187) entered blocking state
Dec 12 12:00:47 linux.hostname.placeholder.it kernel: docker0: port 5(veth70dc187) entered disabled state
Dec 12 12:00:47 linux.hostname.placeholder.it kernel: device veth70dc187 entered promiscuous mode
Dec 12 12:00:47 linux.hostname.placeholder.it kernel: IPv6: ADDRCONF(NETDEV_UP): veth70dc187: link is not ready
Dec 12 12:00:47 linux.hostname.placeholder.it kernel: docker0: port 5(veth70dc187) entered blocking state
Dec 12 12:00:47 linux.hostname.placeholder.it kernel: docker0: port 5(veth70dc187) entered forwarding state
Dec 12 12:00:47 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:47.939283148+01:00" level=error msg="014a76bce64d20765a5bf2dc5b32fdb990e53a80a2fe3ea26343d88a62d41321 cleanup: failed to delete container from containerd: no such container"
Dec 12 12:00:47 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:47.939366568+01:00" level=error msg="Handler for POST /containers/014a76bce64d20765a5bf2dc5b32fdb990e53a80a2fe3ea26343d88a62d41321/start returned error: OCI runtime create failed: container_linux.go:346: starting container process caused \"process_linux.go:297: applying cgroup configuration for process caused \\\"mkdir /sys/fs/cgroup/memory/docker/014a76bce64d20765a5bf2dc5b32fdb990e53a80a2fe3ea26343d88a62d41321: cannot allocate memory\\\"\": unknown"
Dec 12 12:00:47 linux.hostname.placeholder.it containerd[1753]: time="2019-12-12T12:00:47.993997045+01:00" level=info msg="shim containerd-shim started" address="/containerd-shim/moby/44c9cef26857d695932d6d66ea218ea2a8c081732b5b3305fea7e540a65c2331/shim.sock" debug=false pid=106883
Dec 12 12:00:48 linux.hostname.placeholder.it containerd[1753]: time="2019-12-12T12:00:48.095195175+01:00" level=info msg="shim reaped" id=44c9cef26857d695932d6d66ea218ea2a8c081732b5b3305fea7e540a65c2331
Dec 12 12:00:48 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:48.105262650+01:00" level=error msg="stream copy error: reading from a closed fifo"
Dec 12 12:00:48 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:48.105305452+01:00" level=error msg="stream copy error: reading from a closed fifo"
Dec 12 12:00:48 linux.hostname.placeholder.it kernel: docker0: port 5(veth70dc187) entered disabled state
Dec 12 12:00:48 linux.hostname.placeholder.it kernel: docker0: port 6(veth74cd792) entered blocking state
Dec 12 12:00:48 linux.hostname.placeholder.it kernel: docker0: port 6(veth74cd792) entered disabled state
Dec 12 12:00:48 linux.hostname.placeholder.it kernel: device veth74cd792 entered promiscuous mode
Dec 12 12:00:48 linux.hostname.placeholder.it kernel: IPv6: ADDRCONF(NETDEV_UP): veth74cd792: link is not ready
Dec 12 12:00:48 linux.hostname.placeholder.it kernel: docker0: port 6(veth74cd792) entered blocking state
Dec 12 12:00:48 linux.hostname.placeholder.it kernel: docker0: port 6(veth74cd792) entered forwarding state
Dec 12 12:00:48 linux.hostname.placeholder.it kernel: docker0: port 5(veth70dc187) entered disabled state
Dec 12 12:00:48 linux.hostname.placeholder.it kernel: device veth70dc187 left promiscuous mode
Dec 12 12:00:48 linux.hostname.placeholder.it kernel: docker0: port 5(veth70dc187) entered disabled state
Dec 12 12:00:48 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:48.211845631+01:00" level=warning msg="44c9cef26857d695932d6d66ea218ea2a8c081732b5b3305fea7e540a65c2331 cleanup: failed to unmount IPC: umount /var/lib/docker/containers/44c9cef26857d695932d6d66ea218ea2a8c081732b5b3305fea7e540a65c2331/mounts/shm, flags: 0x2: no such file or directory"
Dec 12 12:00:48 linux.hostname.placeholder.it containerd[1753]: time="2019-12-12T12:00:48.247687889+01:00" level=info msg="shim containerd-shim started" address="/containerd-shim/moby/014a76bce64d20765a5bf2dc5b32fdb990e53a80a2fe3ea26343d88a62d41321/shim.sock" debug=false pid=106961
Dec 12 12:00:48 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:48.301734493+01:00" level=error msg="44c9cef26857d695932d6d66ea218ea2a8c081732b5b3305fea7e540a65c2331 cleanup: failed to delete container from containerd: no such container"
Dec 12 12:00:48 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:48.301789037+01:00" level=error msg="Handler for POST /containers/44c9cef26857d695932d6d66ea218ea2a8c081732b5b3305fea7e540a65c2331/start returned error: OCI runtime create failed: container_linux.go:346: starting container process caused \"process_linux.go:297: applying cgroup configuration for process caused \\\"mkdir /sys/fs/cgroup/memory/docker/44c9cef26857d695932d6d66ea218ea2a8c081732b5b3305fea7e540a65c2331: cannot allocate memory\\\"\": unknown"
Must be fixed with kernel kernel-3.10.0-1075.el7
Same issue here: CentOS Linux release 7.7.1908 Kernel 3.10.0-1062.9.1.el7.x86_64 Docker version 19.03.5, build 633a0ea 130+ containers (pods in k8s)
To resolve this issue we are going to replace the kernel with kernel-lt 4.4.206 from elrepo. We are still using iptables, so first we will need to reconfigure our hosts for nftables usage. Let us know if you find some kind of workaround for this issue.
Same issue here: CentOS Linux release 7.7.1908 Kernel 3.10.0-1062.9.1.el7.x86_64 Docker version 19.03.5, build 633a0ea 130+ containers (pods in k8s)
To resolve this issue we are going to replace the kernel with kernel-lt 4.4.206 from elrepo. We are still using iptables, so first we will need to reconfigure our hosts for nftables usage. Let us know if you find some kind of workaround for this issue.
Just so you know, we've tried with various 4.x kernels as well and had the same issue.
Same issue here: CentOS Linux release 7.7.1908 Kernel 3.10.0-1062.9.1.el7.x86_64 Docker version 19.03.5, build 633a0ea 130+ containers (pods in k8s) To resolve this issue we are going to replace the kernel with kernel-lt 4.4.206 from elrepo. We are still using iptables, so first we will need to reconfigure our hosts for nftables usage. Let us know if you find some kind of workaround for this issue.
Just so you know, we've tried with various 4.x kernels as well and had the same issue.
Can you list affected 4.x kernels please? Thank you! We need to fix this so finding the 'right' kernel is the only way as I can see.
It took me around a week to trigger the issue until i reboot the host. If anyone can trig this issue faster than me, possible to test with the following kernel parameter: 'cgroup.memory=nokmem'
It took me around a week to trigger the issue until i reboot the host. If anyone can trig this issue faster than me, possible to test with the following kernel parameter: 'cgroup.memory=nokmem'
I am also facing this issue with mentioned docker(19.03.5) and kernel(kernel-3.10.0-1062) version on RHEL 7.7
Could you also provide where should I add this parameter?
@kanthasamyraja edit etc/default/grub then update the grub config
@kanthasamyraja: Note that the fix for https://bugzilla.redhat.com/show_bug.cgi?id=1507149 is not in kernel-3.10.0-1062
, it's in kernel-3.10.0-1062.4.1
or later. If you're on CentOS 7, the required kernel is in the CentOS Updates repository, not the CentOS Base repository, which should be enabled by default.
Per https://bugzilla.redhat.com/show_bug.cgi?id=1507149#c131 there is possibly a different bug that affects later kernels as well, which is what this ticket was reopened for by @jpmenil .
So if your kernel version was accurate, you should first upgrade to kernel-3.10.0-1062.4.1
to rule out https://bugzilla.redhat.com/show_bug.cgi?id=1507149.
Or you can distinguish them as when the newer issue hits,
meminfo data doesn't suggest a bloated slab usage, but a bloated page-cache usage instead.
@kanthasamyraja: Note that the fix for https://bugzilla.redhat.com/show_bug.cgi?id=1507149 is not in
kernel-3.10.0-1062
, it's inkernel-3.10.0-1062.4.1
or later. If you're on CentOS 7, the required kernel is in the CentOS Updates repository, not the CentOS Base repository, which should be enabled by default.Per https://bugzilla.redhat.com/show_bug.cgi?id=1507149#c131 there is possibly a different bug that affects later kernels as well, which is what this ticket was reopened for by @jpmenil .
So if your kernel version was accurate, you should first upgrade to
kernel-3.10.0-1062.4.1
to rule out https://bugzilla.redhat.com/show_bug.cgi?id=1507149.Or you can distinguish them as when the newer issue hits,
meminfo data doesn't suggest a bloated slab usage, but a bloated page-cache usage instead.
It is working now for me. I am using below version. (RHEL7.7.)
$ sudo rpm -qa | grep kernel-3.10.0-1062 kernel-3.10.0-1062.9.1.el7.x86_64 kernel-3.10.0-1062.4.3.el7.x86_64 kernel-3.10.0-1062.7.1.el7.x86_64 $
Thanks for the information.
@thaJeztah, i think we can close (again) this one, since adding the cgroup.memory=nokmem kernel parameter do the trick.
@jpmenil I'm running RH 7.6 3.10.0-957.1.3.el7.x86_6 and just want to be sure on applying the fix.
1 - Set the kernel parameter (cgroup.memory=nokmem) in /etc/default/grub 2 - Upgrade to kernel-3.10.0-1062.4.1.el7.x86_64 or higher 3 - I'm running docker version 18.06.1-ce. Do I need to upgrade docker?
Any additional steps not listed above?
Thanks in Advanced.
hi, if you leaked too much memory cgroups, new memory cgroup cannot be created and will fail with "Cannot allocate memory".
You can check if there are some empty cgroups in /sys/fs/cgroup/memory
.
@bamb00 only the kernel parameter is needed. no need to upgrade docker.
@jpmenil Thanks! verified that it works when cgroup.nokmem
is configured.
In https://bugzilla.redhat.com/show_bug.cgi?id=1507149, they mentioned that the issue has been fixed in kernel-3.10.0-1075.el7
. Did anyone verify it?
Hello, today I had the same problem in the production of my environment.
My kernel was kernel-3.10.0-1062.9.1, after upgrading to kernel-3.10.0-1062.12.1, all containers started
Does anyone have any other alternative? This problem node is part of a k8s cluster.
Hello, today I had the same problem in the production of my environment.
My kernel was kernel-3.10.0-1062.9.1, after upgrading to kernel-3.10.0-1062.12.1, all containers started
Does anyone have any other alternative? This problem node is part of a k8s cluster.
As mentioned above. Fix is straightforward:
Set the kernel parameter to cgroup.memory=nokmem
or finally, you can fix the problem by upgrading your kernel.
Not sure if it helps someone. I ended up with same issue in a Centos 7 box. I was on docker-19.03.6 and kernel 3.10.0-1062.12.1.el7.x86_64. All my k8s nodes were unstable and causing a mess.
I found out that previously I had done yum update(across nodes while upgrading k8s) which updated my containerd version to 1.2.12.
So i removed docker and containerd, and reinstalled it with defaults(19.03.6 and 1.2.10). Things looks stable here.
All posts related to kmem and kernel flags didn't help me though.
I solved the problem by adding cgroup.memory = nokmem inside / etc / default / grub on the GRUB_CMDLINE_LINUX line.
After this:
grub2-mkconfig -o /boot/grub2/grub.cfg
reboot
this is because of cgroupv2 change on the latest version of Linux kernel
check this is Article here
@YazanALMonshed you posted some link to some random arab anime site.
still hitting this error on CentOS Linux 7
$ rpm -qa | grep kernel-3.10.0-1062
kernel-3.10.0-1062.12.1.el7.x86_64
kernel-3.10.0-1062.9.1.el7.x86_64
$ uname -r
3.10.0-1062.12.1.el7.x86_64
$ docker --version
Docker version 19.03.6, build 369ce74a3c
We hit this about a 6 or 8 weeks ago, upgraded the kernel and though it resolved, unfortunately, it cropped up again last night, and we cannot start any new containers.
same here
Warning Failed 44m (x3922 over 13d) kubelet, xxxx-xxxxx Error: failed to start container "rules-configmap-reloader": Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:297: applying cgroup configuration for process caused \"mkdir /sys/fs/cgroup/memory/kubepods/burstable/pod1b9669e6-7fe2-11ea-85f3-00505608c440/rules-configmap-reloader: cannot allocate memory\"": unknown
Warning Failed 39m (x3916 over 13d) kubelet, xxxx-xxxxx Error: failed to start container "prometheus-config-reloader": Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:297: applying cgroup configuration for process caused \"mkdir /sys/fs/cgroup/memory/kubepods/burstable/pod1b9669e6-7fe2-11ea-85f3-00505608c440/prometheus-config-reloader: cannot allocate memory\"": unknown
As explained earlier, you will need to boot with cgroup.memory=nokmem parameter
If it's not working , try systemd.unified_cgroup_hierarchy=0 .
Tried the Centos 7.8 version kernel 3.10.0-1127.el7.x86_64 and still got the slab memory leak, looks like the issue is not/partially resolved. The following symptoms show the issue for me:
unable to ensure pod container exists: failed to create container for [kubepods burstable pod45a22b91-5381-4360-a74d-f4e2cd8aa7ac] : mkdir /sys/fs/cgroup/memory/kubepods/burstable/pod45a22b91-5381-4360-a74d-f4e2cd8aa7ac: cannot allocate memory
# ls /sys/kernel/slab | wc -l
117199
I solved the problem by adding cgroup.memory = nokmem inside / etc / default / grub on the GRUB_CMDLINE_LINUX line.
After this:
grub2-mkconfig -o /boot/grub2/grub.cfg
reboot
This does not work on Fedora 31 and 32. Fedora 31, kernel 5.6.15-200.fc31, docker 19.03.8 Fedora 32, kernel 5.6.15-300.fc32, docker 19.03.8
@Siddharth-Hari, do you have cgroup.memory=nokmem set to your kernel cmdline?
I'm seeing this problem. I just rebooted after putting cgroup.memory=nokmem
in /etc/default/grub
ran grub2-mkconfig -o /boot/grub2/grub.cfg
and rebooted.
Veriefied that the cgroup.mem...
made it into the kernel:
2020-07-11 15:19:56 - wwalker@plutonium:~ ✓ $ cat /proc/cmdline
BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.7.7-200.fc32.x86_64 root=/dev/mapper/fedora_plutonium-root ro resume=/dev/mapper/fedora_plutonium-swap rd.lvm.lv=fedora_plutonium/root rd.luks.uuid=luks-d12074c3-3fe9-4de3-bbd8-170b1e464092 rd.lvm.lv=fedora_plutonium/swap cgroup.memory=nokmem
Still getting :
2020-07-11 15:23:12 - wwalker@plutonium:~ ✘ $ docker run --name unauthenticated-jupyter-notebook -p 8888:8888 -d jupyter/base-notebook start-notebook.sh --NotebookApp.token=''
c327d94b0f1a8fd5589dd78b4b373407027591aebf0eded3602e3bd1b0fbb37c
docker: Error response from daemon: OCI runtime create failed: this version of runc doesn't work on cgroups v2: unknown.
2020-07-11 15:23:19 - wwalker@plutonium:~ ✘ $
I'm seeing this problem. I just rebooted after putting
cgroup.memory=nokmem
in/etc/default/grub
rangrub2-mkconfig -o /boot/grub2/grub.cfg
and rebooted.Veriefied that the
cgroup.mem...
made it into the kernel:2020-07-11 15:19:56 - wwalker@plutonium:~ ✓ $ cat /proc/cmdline BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.7.7-200.fc32.x86_64 root=/dev/mapper/fedora_plutonium-root ro resume=/dev/mapper/fedora_plutonium-swap rd.lvm.lv=fedora_plutonium/root rd.luks.uuid=luks-d12074c3-3fe9-4de3-bbd8-170b1e464092 rd.lvm.lv=fedora_plutonium/swap cgroup.memory=nokmem
Still getting :
2020-07-11 15:23:12 - wwalker@plutonium:~ ✘ $ docker run --name unauthenticated-jupyter-notebook -p 8888:8888 -d jupyter/base-notebook start-notebook.sh --NotebookApp.token='' c327d94b0f1a8fd5589dd78b4b373407027591aebf0eded3602e3bd1b0fbb37c docker: Error response from daemon: OCI runtime create failed: this version of runc doesn't work on cgroups v2: unknown. 2020-07-11 15:23:19 - wwalker@plutonium:~ ✘ $
+1 and is blocking :(
Currently, you can use grub in /etc/default/grub, “GRUP_CMDLINE_LINUX” field setting “cgrop.memory=nokmem” This problem can be avoided. Whether this problem can be solved completely is still being tested
This method also has some disadvantages
1) If the node server is restarted, the pod will drift. If the node scale is large, the upgrade operation will be very cumbersome, and the business department will have comments, so we should communicate in advance.
Hi
Anyone know if 3.10.0-1127.19.1.el7
fixes the issue? I am at 3.10.0-1062.el7
so we should update.
That works for CentOS 7:
I solved the problem by adding cgroup.memory = nokmem inside / etc / default / grub on the GRUB_CMDLINE_LINUX line.
After this:
grub2-mkconfig -o /boot/grub2/grub.cfg
reboot
I had this issue out of the blue on an otherwise idle k8s v18 cluster, with a pretty recent CentOS 7 kernel, did an upgrade to the latest packages, added cgroup.memory=nokmem to boot params with grubby and haven't seen the issue since the reboot.
The upgrade was docker-ce 19.03.12-3 => 19.03.13-3 and kernel 3.10.0-1127.13.1 => 3.10.0-1127.19.1.
I had this issue with this kernel version:
[root@master debug]# uname -a
Linux master 3.10.0-1127.13.1.el7.x86_64 #1 SMP Tue Jun 23 15:46:38 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
[root@master debug]# lsb_release -a
LSB Version: :core-4.1-amd64:core-4.1-noarch
Distributor ID: CentOS
Description: CentOS Linux release 7.8.2003 (Core)
Release: 7.8.2003
Codename: Core
docker server version 19.03.12
Are you all adding the cgroup.memory kernel parameter to master nodes as well? Seems to only apply to nodes where deployments are scheduled, but for consistency, I'm wondering about the master nodes as well.
On all redhat related distributions, it may also be something related to the enablement of cgroupsv2. see https://www.redhat.com/sysadmin/fedora-31-control-group-v2 and https://www.linuxuprising.com/2019/11/how-to-install-and-use-docker-on-fedora.html
I'm here with this error and it's because from Fedora >= 31 has moved to cgroups v2. Using podman with the podman-docker interface works OK, except of course containers need to also support cgroups v2 and CentOS 7 does not. :(
I have the same issue on Ubuntu 18.04
Operating System: Ubuntu 18.04.5 LTS
Kernel: Linux 4.15.0
Architecture: x86-64
Client: Docker Engine - Community
Version: 20.10.2
API version: 1.40
Go version: go1.13.15
Git commit: 2291f61
Built: Mon Dec 28 16:17:32 2020
OS/Arch: linux/amd64
Context: default
Experimental: true
Server: Docker Engine - Community
Engine:
Version: 19.03.11
API version: 1.40 (minimum version 1.12)
Go version: go1.13.10
Git commit: 42e35e61f3
Built: Mon Jun 1 09:10:54 2020
I'm facing the same issue, but I'm not sure if the issue came from cgroup memory.
I tried to create my self cgroups and delete them and works fine, but still have the issue.
Logs from the Kubernetes node
Jan 19 13:15:43 xxxxxx kubelet[9279]: E0119 13:15:43.049088 9279 pod_workers.go:191] Error syncing pod e886905b-acf0-47df-8c5d-b20b07e7a824 ("xxxxxx(e886905b-acf0-47df-8c5d-b20b07e7a824)"), skipping: failed to ensure that the pod: e886905b-acf0-47df-8c5d-b20b07e7a824 cgroups exist and are correctly applied: failed to create container for [kubepods burstable pode886905b-acf0-47df-8c5d-b20b07e7a824] : mkdir /sys/fs/cgroup/memory/kubepods/burstable/pode886905b-acf0-47df-8c5d-b20b07e7a824: cannot allocate memory
Kernel
Centos 7 - 3.10.0-1127.19.1.el7.x86_64
Disabling the memory accounting with the kernel parameter cgroup.memory = nokmem
could produce some overflow ?
Fedora 33 Server here.. brand new install tonight. I added the kernel parameter with the fedora supplied docker and could not get hello-world to work. https://docs.docker.com/engine/install/fedora/ , removes fedora supplied docker and replaces it.. rebooted and removed the kernel parameter, docker images needed to be rm b/c of overlay.. but after docker rm image; "things seem ok so far" (tm)
Expected behavior
Docker should successfully start hello-world container.
Actual behavior
After a certain amount of time, docker fails to start any containers on a host with the following error:
[root@REDACTED]# docker run hello-world docker: Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:279: applying cgroup configuration for process caused \"mkdir /sys/fs/cgroup/memory/docker/fe4159ed6f4ec16af63ba0c2af53ec9c6b0c0c2ac42ff96f6816d5e28a821b4e: cannot allocate memory\"": unknown. ERRO[0000] error waiting for container: context canceled
This issue has been fixed in the past by restarting the docker daemon or rebooting the machine although the docker daemon is active and running at the time of running the container. The machine has ample available memory and cpus and should have no problem starting the container.
Steps to reproduce the behavior
Output of
docker version
:Output of
docker info
:Additional environment details (AWS, VirtualBox, physical, etc.) At the time of running the container, the host has 500GB of available memory and around 50+free cores.