docker / for-linux

Docker Engine for Linux
https://docs.docker.com/engine/installation/
754 stars 85 forks source link

Docker fails to start containers with cgroup memory allocation error. #841

Open JakeBonek opened 4 years ago

JakeBonek commented 4 years ago

Expected behavior

Docker should successfully start hello-world container.

Actual behavior

After a certain amount of time, docker fails to start any containers on a host with the following error:

[root@REDACTED]# docker run hello-world docker: Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:279: applying cgroup configuration for process caused \"mkdir /sys/fs/cgroup/memory/docker/fe4159ed6f4ec16af63ba0c2af53ec9c6b0c0c2ac42ff96f6816d5e28a821b4e: cannot allocate memory\"": unknown. ERRO[0000] error waiting for container: context canceled

This issue has been fixed in the past by restarting the docker daemon or rebooting the machine although the docker daemon is active and running at the time of running the container. The machine has ample available memory and cpus and should have no problem starting the container.

Steps to reproduce the behavior

Output of docker version:

Client:
 Version:           18.06.1-ce
 API version:       1.38
 Go version:        go1.10.3
 Git commit:        e68fc7a
 Built:             Tue Aug 21 17:23:03 2018
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          18.06.1-ce
  API version:      1.38 (minimum version 1.12)
  Go version:       go1.10.3
  Git commit:       e68fc7a
  Built:            Tue Aug 21 17:25:29 2018
  OS/Arch:          linux/amd64
  Experimental:     false

Output of docker info:

Containers: 39
 Running: 17
 Paused: 0
 Stopped: 22
Images: 39
Server Version: 18.06.1-ce
Storage Driver: overlay2
 Backing Filesystem: xfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 468a545b9edcd5932818eb9de8e72413e616e86e
runc version: 69663f0bd4b60df09991c08812a60108003fa340
init version: fec3683
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-957.1.3.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 56
Total Memory: 503.6GiB
Name: REDACTED
ID: UK7O:GWIS:TFRJ:JDUB:5SS7:GH6W:TA4K:NBQC:7W4V:YLZJ:Q2AV:UBXA
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: bridge-nf-call-ip6tables is disabled

Additional environment details (AWS, VirtualBox, physical, etc.) At the time of running the container, the host has 500GB of available memory and around 50+free cores.

kai-cool-dev commented 4 years ago

Do you have swap active? Try to disable the swap!

JakeBonek commented 4 years ago

Swap is disabled on the host.

thaJeztah commented 4 years ago

This could be related to a bug in the RHEL/CentOS kernels where kernel-memory cgroups doesn't work properly; we included a workaround for this in later versions of docker to disable this feature; https://github.com/moby/moby/pull/38145 (backported to Docker 18.09 and up https://github.com/docker/engine/pull/121)

Note that Docker 18.06 reached EOL, and won't be updated with this fix, so I recommend updating to a current version.

I'm closing this issue because of the above, but feel free to continue the conversation

maiconbaumx commented 4 years ago

Hello. I'm facing this same problem in my environment and seems quite like a bug, because it ramdonly happens in a cluster with more than 350 containers. Is there a chance that this bug is present on this current versions?

# docker --version
Docker version 19.03.5, build 633a0ea

# docker version
Client: Docker Engine - Community
 Version:           19.03.5
 API version:       1.40
 Go version:        go1.12.12
 Git commit:        633a0ea
 Built:             Wed Nov 13 07:25:41 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.5
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.12
  Git commit:       633a0ea
  Built:            Wed Nov 13 07:24:18 2019
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.10
  GitCommit:        b34a5c8af56e510852c35414db4c1f4fa6172339
 runc:
  Version:          1.0.0-rc8+dev
  GitCommit:        3e425f80a8c931f88e6d94a8c831b9d5aa481657
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

#  containerd --version
containerd  1.2.10 b34a5c8af56e510852c35414db4c1f4fa6172339

#  uname -r
3.10.0-1062.4.3.el7.x86_64
guruprakashs commented 4 years ago

@thaJeztah

We are also seeing this issue in our cluster.

# docker run -it c7c39515eefe bash
docker: Error response from daemon: OCI runtime create failed: container_linux.go:344: starting container process caused "process_linux.go:275: applying cgroup configuration for process caused \"mkdir /sys/fs/cgroup/memory/docker/56ca1a748e94176c378682012a8ad1a6cab3b812dfb1f34e9da303d47d8f0e97: cannot allocate memory\"": unknown.

These are the software versions that we are on. Could you please advise?

# docker info
Containers: 29
 Running: 19
 Paused: 0
 Stopped: 10
Images: 184
Server Version: 18.09.3
Storage Driver: overlay2
 Backing Filesystem: xfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: e6b3f5632f50dbc4e9cb6288d911bf4f5e95b18e
runc version: 6635b4f0c6af3810594d2770f662f34ddc15b40d
init version: fec3683
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-957.1.3.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 16
Total Memory: 503.8GiB
Name: hostname.here
ID: QG35:QFQQ:ZLOZ:BZEC:SKL5:CDJ2:74VV:WFDO:5PCY:MJEN:VMQB:DNA5
Docker Root Dir: /data/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine

# uname -r
3.10.0-957.1.3.el7.x86_64

# containerd --version
containerd github.com/containerd/containerd 1.2.4 e6b3f5632f50dbc4e9cb6288d911bf4f5e95b18e

Thanks

ntk148v commented 4 years ago

@thaJeztah I'm facing the exact same issue in my environment.

# uname -a
Linux monitor49 3.10.0-957.5.1.el7.x86_64 #1 SMP Fri Feb 1 14:54:57 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

# docker info
Containers: 14
 Running: 13
 Paused: 0
 Stopped: 1
Images: 54
Server Version: 18.06.0-ce
Storage Driver: overlay2
 Backing Filesystem: xfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: d64c661f1d51c48782c9cec8fda7604785f93587
runc version: 69663f0bd4b60df09991c08812a60108003fa340
init version: fec3683
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-957.5.1.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 32
Total Memory: 125.7GiB
Name: monitor49
ID: 5T2R:BZFE:TQD3:LXSE:GUC7:5WNG:O5WY:CLJ2:FT62:J7ZX:EYB2:H67D
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 nexus.5f.cloud:8890
 nexus.5f.cloud:8891
 nexus.cloud:8890
 nexus.cloud:8891
 127.0.0.0/8
Live Restore Enabled: true

# docker-containerd --version
containerd github.com/containerd/containerd v1.1.1 d64c661f1d51c48782c9cec8fda7604785f93587
jpmenil commented 4 years ago

same here, RedHat 7.7. kernel 3.10.0-1062.4.1.el7.x86_64 with docker version 19.03.5, build 633a0ea @thaJeztah can you reopen the issue ?

jpmenil commented 4 years ago

This is the continuity of this kernel bug, at least on RH: https://bugzilla.redhat.com/show_bug.cgi?id=1507149

petersbattaglia commented 4 years ago

repros on CentOS 7 kernel Linux 3.10.0-1062.4.3.el7.x86_64 Docker version 19.03.5, build 633a0ea

cccdemon commented 4 years ago

Same Issue here

Centos 7 Kernel: Linux linux.hostname.placeholder.it 3.10.0-1062.4.3.el7.x86_64 #1 SMP Wed Nov 13 23:58:53 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Docker version 19.03.5, build 633a0ea

Provisioned via Nomad

Log:

Dec 12 12:00:46 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:46.033619039+01:00" level=error msg="9c9e6096b6b2855934d9a1a06250969d44466145f9a392f86b0515f34630288b cleanup: failed to delete container from containerd: no such container"
Dec 12 12:00:46 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:46.033708452+01:00" level=error msg="Handler for POST /containers/9c9e6096b6b2855934d9a1a06250969d44466145f9a392f86b0515f34630288b/start returned error: OCI runtime create failed: container_linux.go:346: starting container process caused \"process_linux.go:297: applying cgroup configuration for process caused \\\"mkdir /sys/fs/cgroup/memory/docker/9c9e6096b6b2855934d9a1a06250969d44466145f9a392f86b0515f34630288b: cannot allocate memory\\\"\": unknown"
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: docker0: port 6(veth810fe6d) entered blocking state
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: docker0: port 6(veth810fe6d) entered disabled state
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: device veth810fe6d entered promiscuous mode
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: IPv6: ADDRCONF(NETDEV_UP): veth810fe6d: link is not ready
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: docker0: port 6(veth810fe6d) entered blocking state
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: docker0: port 6(veth810fe6d) entered forwarding state
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: docker0: port 7(vethf942213) entered blocking state
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: docker0: port 7(vethf942213) entered disabled state
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: device vethf942213 entered promiscuous mode
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: IPv6: ADDRCONF(NETDEV_UP): vethf942213: link is not ready
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: docker0: port 7(vethf942213) entered blocking state
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: docker0: port 7(vethf942213) entered forwarding state
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: docker0: port 5(vethd70c60e) entered disabled state
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: docker0: port 6(veth810fe6d) entered disabled state
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: docker0: port 7(vethf942213) entered disabled state
Dec 12 12:00:46 linux.hostname.placeholder.it containerd[1753]: time="2019-12-12T12:00:46.164338118+01:00" level=info msg="shim containerd-shim started" address="/containerd-shim/moby/114d4d0d12a56762e6a5b3b3ba5c9490285203f264e1b855c999eead5b9e891b/shim.sock" debug=false pid=106646
Dec 12 12:00:46 linux.hostname.placeholder.it containerd[1753]: time="2019-12-12T12:00:46.165050163+01:00" level=info msg="shim containerd-shim started" address="/containerd-shim/moby/7ab9f53ec0d561800e6b5b61e98f6be75777f154966a498eb4947d5a73723914/shim.sock" debug=false pid=106647
Dec 12 12:00:46 linux.hostname.placeholder.it containerd[1753]: time="2019-12-12T12:00:46.170620429+01:00" level=info msg="shim containerd-shim started" address="/containerd-shim/moby/b27ad5a77e4469e1025d4311cf4a735e630c33907209cf31f472e8f909c7caf1/shim.sock" debug=false pid=106666
Dec 12 12:00:46 linux.hostname.placeholder.it containerd[1753]: time="2019-12-12T12:00:46.267713777+01:00" level=info msg="shim reaped" id=b27ad5a77e4469e1025d4311cf4a735e630c33907209cf31f472e8f909c7caf1
Dec 12 12:00:46 linux.hostname.placeholder.it containerd[1753]: time="2019-12-12T12:00:46.275364215+01:00" level=info msg="shim reaped" id=114d4d0d12a56762e6a5b3b3ba5c9490285203f264e1b855c999eead5b9e891b
Dec 12 12:00:46 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:46.277650799+01:00" level=error msg="stream copy error: reading from a closed fifo"
Dec 12 12:00:46 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:46.277696613+01:00" level=error msg="stream copy error: reading from a closed fifo"
Dec 12 12:00:46 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:46.285452523+01:00" level=error msg="stream copy error: reading from a closed fifo"
Dec 12 12:00:46 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:46.285484175+01:00" level=error msg="stream copy error: reading from a closed fifo"
Dec 12 12:00:46 linux.hostname.placeholder.it containerd[1753]: time="2019-12-12T12:00:46.287996609+01:00" level=info msg="shim reaped" id=7ab9f53ec0d561800e6b5b61e98f6be75777f154966a498eb4947d5a73723914
Dec 12 12:00:46 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:46.297959225+01:00" level=error msg="stream copy error: reading from a closed fifo"
Dec 12 12:00:46 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:46.297968748+01:00" level=error msg="stream copy error: reading from a closed fifo"
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: docker0: port 7(vethf942213) entered disabled state
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: device vethf942213 left promiscuous mode
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: docker0: port 7(vethf942213) entered disabled state
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: docker0: port 5(vethd70c60e) entered disabled state
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: device vethd70c60e left promiscuous mode
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: docker0: port 5(vethd70c60e) entered disabled state
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: docker0: port 6(veth810fe6d) entered disabled state
Dec 12 12:00:46 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:46.465478486+01:00" level=warning msg="b27ad5a77e4469e1025d4311cf4a735e630c33907209cf31f472e8f909c7caf1 cleanup: failed to unmount IPC: umount /var/lib/docker/containers/b27ad5a77e4469e1025d4311cf4a735e630c33907209cf31f472e8f909c7caf1/mounts/shm, flags: 0x2: no such file or directory"
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: device veth810fe6d left promiscuous mode
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: docker0: port 6(veth810fe6d) entered disabled state
Dec 12 12:00:46 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:46.473303028+01:00" level=warning msg="114d4d0d12a56762e6a5b3b3ba5c9490285203f264e1b855c999eead5b9e891b cleanup: failed to unmount IPC: umount /var/lib/docker/containers/114d4d0d12a56762e6a5b3b3ba5c9490285203f264e1b855c999eead5b9e891b/mounts/shm, flags: 0x2: no such file or directory"
Dec 12 12:00:46 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:46.521090337+01:00" level=warning msg="7ab9f53ec0d561800e6b5b61e98f6be75777f154966a498eb4947d5a73723914 cleanup: failed to unmount IPC: umount /var/lib/docker/containers/7ab9f53ec0d561800e6b5b61e98f6be75777f154966a498eb4947d5a73723914/mounts/shm, flags: 0x2: no such file or directory"
Dec 12 12:00:46 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:46.578620238+01:00" level=error msg="114d4d0d12a56762e6a5b3b3ba5c9490285203f264e1b855c999eead5b9e891b cleanup: failed to delete container from containerd: no such container"
Dec 12 12:00:46 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:46.578710816+01:00" level=error msg="Handler for POST /containers/114d4d0d12a56762e6a5b3b3ba5c9490285203f264e1b855c999eead5b9e891b/start returned error: OCI runtime create failed: container_linux.go:346: starting container process caused \"process_linux.go:297: applying cgroup configuration for process caused \\\"mkdir /sys/fs/cgroup/memory/docker/114d4d0d12a56762e6a5b3b3ba5c9490285203f264e1b855c999eead5b9e891b: cannot allocate memory\\\"\": unknown"
Dec 12 12:00:46 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:46.581544749+01:00" level=error msg="b27ad5a77e4469e1025d4311cf4a735e630c33907209cf31f472e8f909c7caf1 cleanup: failed to delete container from containerd: no such container"
Dec 12 12:00:46 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:46.581584376+01:00" level=error msg="Handler for POST /containers/b27ad5a77e4469e1025d4311cf4a735e630c33907209cf31f472e8f909c7caf1/start returned error: OCI runtime create failed: container_linux.go:346: starting container process caused \"process_linux.go:297: applying cgroup configuration for process caused \\\"mkdir /sys/fs/cgroup/memory/docker/b27ad5a77e4469e1025d4311cf4a735e630c33907209cf31f472e8f909c7caf1: cannot allocate memory\\\"\": unknown"
Dec 12 12:00:46 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:46.610861406+01:00" level=error msg="7ab9f53ec0d561800e6b5b61e98f6be75777f154966a498eb4947d5a73723914 cleanup: failed to delete container from containerd: no such container"
Dec 12 12:00:46 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:46.610913300+01:00" level=error msg="Handler for POST /containers/7ab9f53ec0d561800e6b5b61e98f6be75777f154966a498eb4947d5a73723914/start returned error: OCI runtime create failed: container_linux.go:346: starting container process caused \"process_linux.go:297: applying cgroup configuration for process caused \\\"mkdir /sys/fs/cgroup/memory/docker/7ab9f53ec0d561800e6b5b61e98f6be75777f154966a498eb4947d5a73723914: cannot allocate memory\\\"\": unknown"
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: docker0: port 5(veth83d5462) entered blocking state
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: docker0: port 5(veth83d5462) entered disabled state
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: device veth83d5462 entered promiscuous mode
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: IPv6: ADDRCONF(NETDEV_UP): veth83d5462: link is not ready
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: docker0: port 5(veth83d5462) entered blocking state
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: docker0: port 5(veth83d5462) entered forwarding state
Dec 12 12:00:46 linux.hostname.placeholder.it containerd[1753]: time="2019-12-12T12:00:46.767810035+01:00" level=info msg="shim containerd-shim started" address="/containerd-shim/moby/09e1d8749a5d3abd187233dcf6555dbb13e3512d26e9ad53088e1c8c3cc33c22/shim.sock" debug=false pid=106740
Dec 12 12:00:46 linux.hostname.placeholder.it containerd[1753]: time="2019-12-12T12:00:46.897232357+01:00" level=info msg="shim reaped" id=09e1d8749a5d3abd187233dcf6555dbb13e3512d26e9ad53088e1c8c3cc33c22
Dec 12 12:00:46 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:46.908706574+01:00" level=error msg="stream copy error: reading from a closed fifo"
Dec 12 12:00:46 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:46.908878386+01:00" level=error msg="stream copy error: reading from a closed fifo"
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: docker0: port 5(veth83d5462) entered disabled state
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: device veth83d5462 left promiscuous mode
Dec 12 12:00:46 linux.hostname.placeholder.it kernel: docker0: port 5(veth83d5462) entered disabled state
Dec 12 12:00:46 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:46.976899282+01:00" level=warning msg="09e1d8749a5d3abd187233dcf6555dbb13e3512d26e9ad53088e1c8c3cc33c22 cleanup: failed to unmount IPC: umount /var/lib/docker/containers/09e1d8749a5d3abd187233dcf6555dbb13e3512d26e9ad53088e1c8c3cc33c22/mounts/shm, flags: 0x2: no such file or directory"
Dec 12 12:00:47 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:47.058601763+01:00" level=error msg="09e1d8749a5d3abd187233dcf6555dbb13e3512d26e9ad53088e1c8c3cc33c22 cleanup: failed to delete container from containerd: no such container"
Dec 12 12:00:47 linux.hostname.placeholder.it nomad[1733]: 2019-12-12T12:00:47.058+0100 [ERROR] client.driver_mgr.docker: failed to start container: driver=docker container_id=09e1d8749a5d3abd187233dcf6555dbb13e3512d26e9ad53088e1c8c3cc33c22 error="API error (500): OCI runtime create failed: container_linux.go:346: starting container process caused "process_linux.go:297: applying cgroup configuration for process caused \"mkdir /sys/fs/cgroup/memory/docker/09e1d8749a5d3abd187233dcf6555dbb13e3512d26e9ad53088e1c8c3cc33c22: cannot allocate memory\"": unknown"
Dec 12 12:00:47 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:47.058699552+01:00" level=error msg="Handler for POST /containers/09e1d8749a5d3abd187233dcf6555dbb13e3512d26e9ad53088e1c8c3cc33c22/start returned error: OCI runtime create failed: container_linux.go:346: starting container process caused \"process_linux.go:297: applying cgroup configuration for process caused \\\"mkdir /sys/fs/cgroup/memory/docker/09e1d8749a5d3abd187233dcf6555dbb13e3512d26e9ad53088e1c8c3cc33c22: cannot allocate memory\\\"\": unknown"
Dec 12 12:00:47 linux.hostname.placeholder.it nomad[1733]: 2019-12-12T12:00:47.179+0100 [ERROR] client.alloc_runner.task_runner: running driver failed: alloc_id=22c7c014-c45f-a3ec-1b72-e441f5efb57e task=core-drones-event-handler error="Failed to start container 09e1d8749a5d3abd187233dcf6555dbb13e3512d26e9ad53088e1c8c3cc33c22: API error (500): OCI runtime create failed: container_linux.go:346: starting container process caused "process_linux.go:297: applying cgroup configuration for process caused \"mkdir /sys/fs/cgroup/memory/docker/09e1d8749a5d3abd187233dcf6555dbb13e3512d26e9ad53088e1c8c3cc33c22: cannot allocate memory\"": unknown"
Dec 12 12:00:47 linux.hostname.placeholder.it nomad[1733]: 2019-12-12T12:00:47.179+0100 [INFO ] client.alloc_runner.task_runner: restarting task: alloc_id=22c7c014-c45f-a3ec-1b72-e441f5efb57e task=core-drones-event-handler reason="Restart within policy" delay=17.065586128s
Dec 12 12:00:47 linux.hostname.placeholder.it kernel: docker0: port 5(veth90db994) entered blocking state
Dec 12 12:00:47 linux.hostname.placeholder.it kernel: docker0: port 5(veth90db994) entered disabled state
Dec 12 12:00:47 linux.hostname.placeholder.it kernel: device veth90db994 entered promiscuous mode
Dec 12 12:00:47 linux.hostname.placeholder.it kernel: IPv6: ADDRCONF(NETDEV_UP): veth90db994: link is not ready
Dec 12 12:00:47 linux.hostname.placeholder.it kernel: docker0: port 5(veth90db994) entered blocking state
Dec 12 12:00:47 linux.hostname.placeholder.it kernel: docker0: port 5(veth90db994) entered forwarding state
Dec 12 12:00:47 linux.hostname.placeholder.it containerd[1753]: time="2019-12-12T12:00:47.229192684+01:00" level=info msg="shim containerd-shim started" address="/containerd-shim/moby/adb62738eb9315a239aac02d981ca0d5afbb7d66d99a977b6d9db134036df94d/shim.sock" debug=false pid=106774
Dec 12 12:00:47 linux.hostname.placeholder.it containerd[1753]: time="2019-12-12T12:00:47.348654188+01:00" level=info msg="shim reaped" id=adb62738eb9315a239aac02d981ca0d5afbb7d66d99a977b6d9db134036df94d
Dec 12 12:00:47 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:47.358609610+01:00" level=error msg="stream copy error: reading from a closed fifo"
Dec 12 12:00:47 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:47.358609645+01:00" level=error msg="stream copy error: reading from a closed fifo"
Dec 12 12:00:47 linux.hostname.placeholder.it kernel: docker0: port 5(veth90db994) entered disabled state
Dec 12 12:00:47 linux.hostname.placeholder.it kernel: device veth90db994 left promiscuous mode
Dec 12 12:00:47 linux.hostname.placeholder.it kernel: docker0: port 5(veth90db994) entered disabled state
Dec 12 12:00:47 linux.hostname.placeholder.it nomad[1733]: 2019-12-12T12:00:47.458+0100 [INFO ] client.driver_mgr.docker: created container: driver=docker container_id=014a76bce64d20765a5bf2dc5b32fdb990e53a80a2fe3ea26343d88a62d41321
Dec 12 12:00:47 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:47.461382343+01:00" level=warning msg="adb62738eb9315a239aac02d981ca0d5afbb7d66d99a977b6d9db134036df94d cleanup: failed to unmount IPC: umount /var/lib/docker/containers/adb62738eb9315a239aac02d981ca0d5afbb7d66d99a977b6d9db134036df94d/mounts/shm, flags: 0x2: no such file or directory"
Dec 12 12:00:47 linux.hostname.placeholder.it kernel: docker0: port 5(vethc153a0c) entered blocking state
Dec 12 12:00:47 linux.hostname.placeholder.it kernel: docker0: port 5(vethc153a0c) entered disabled state
Dec 12 12:00:47 linux.hostname.placeholder.it kernel: device vethc153a0c entered promiscuous mode
Dec 12 12:00:47 linux.hostname.placeholder.it kernel: IPv6: ADDRCONF(NETDEV_UP): vethc153a0c: link is not ready
Dec 12 12:00:47 linux.hostname.placeholder.it kernel: docker0: port 5(vethc153a0c) entered blocking state
Dec 12 12:00:47 linux.hostname.placeholder.it kernel: docker0: port 5(vethc153a0c) entered forwarding state
Dec 12 12:00:47 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:47.542195860+01:00" level=error msg="adb62738eb9315a239aac02d981ca0d5afbb7d66d99a977b6d9db134036df94d cleanup: failed to delete container from containerd: no such container"
Dec 12 12:00:47 linux.hostname.placeholder.it nomad[1733]: 2019-12-12T12:00:47.542+0100 [ERROR] client.driver_mgr.docker: failed to start container: driver=docker container_id=adb62738eb9315a239aac02d981ca0d5afbb7d66d99a977b6d9db134036df94d error="API error (500): OCI runtime create failed: container_linux.go:346: starting container process caused "process_linux.go:297: applying cgroup configuration for process caused \"mkdir /sys/fs/cgroup/memory/docker/adb62738eb9315a239aac02d981ca0d5afbb7d66d99a977b6d9db134036df94d: cannot allocate memory\"": unknown"
Dec 12 12:00:47 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:47.542233359+01:00" level=error msg="Handler for POST /containers/adb62738eb9315a239aac02d981ca0d5afbb7d66d99a977b6d9db134036df94d/start returned error: OCI runtime create failed: container_linux.go:346: starting container process caused \"process_linux.go:297: applying cgroup configuration for process caused \\\"mkdir /sys/fs/cgroup/memory/docker/adb62738eb9315a239aac02d981ca0d5afbb7d66d99a977b6d9db134036df94d: cannot allocate memory\\\"\": unknown"
Dec 12 12:00:47 linux.hostname.placeholder.it containerd[1753]: time="2019-12-12T12:00:47.551852060+01:00" level=info msg="shim containerd-shim started" address="/containerd-shim/moby/014a76bce64d20765a5bf2dc5b32fdb990e53a80a2fe3ea26343d88a62d41321/shim.sock" debug=false pid=106820
Dec 12 12:00:47 linux.hostname.placeholder.it nomad[1733]: 2019-12-12T12:00:47.658+0100 [ERROR] client.alloc_runner.task_runner: running driver failed: alloc_id=0f03d341-2db7-ef1f-ac3d-b46729121047 task=core-drones-sensor error="Failed to start container adb62738eb9315a239aac02d981ca0d5afbb7d66d99a977b6d9db134036df94d: API error (500): OCI runtime create failed: container_linux.go:346: starting container process caused "process_linux.go:297: applying cgroup configuration for process caused \"mkdir /sys/fs/cgroup/memory/docker/adb62738eb9315a239aac02d981ca0d5afbb7d66d99a977b6d9db134036df94d: cannot allocate memory\"": unknown"
Dec 12 12:00:47 linux.hostname.placeholder.it nomad[1733]: 2019-12-12T12:00:47.658+0100 [INFO ] client.alloc_runner.task_runner: restarting task: alloc_id=0f03d341-2db7-ef1f-ac3d-b46729121047 task=core-drones-sensor reason="Restart within policy" delay=15.333442815s
Dec 12 12:00:47 linux.hostname.placeholder.it containerd[1753]: time="2019-12-12T12:00:47.685596667+01:00" level=info msg="shim reaped" id=014a76bce64d20765a5bf2dc5b32fdb990e53a80a2fe3ea26343d88a62d41321
Dec 12 12:00:47 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:47.695890735+01:00" level=error msg="stream copy error: reading from a closed fifo"
Dec 12 12:00:47 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:47.695939782+01:00" level=error msg="stream copy error: reading from a closed fifo"
Dec 12 12:00:47 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:47.757933654+01:00" level=warning msg="Error getting v2 registry: Get https://registry:5000/v2/: http: server gave HTTP response to HTTPS client"
Dec 12 12:00:47 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:47.758010520+01:00" level=info msg="Attempting next endpoint for pull after error: Get https://registry:5000/v2/: http: server gave HTTP response to HTTPS client"
Dec 12 12:00:47 linux.hostname.placeholder.it kernel: docker0: port 5(vethc153a0c) entered disabled state
Dec 12 12:00:47 linux.hostname.placeholder.it kernel: device vethc153a0c left promiscuous mode
Dec 12 12:00:47 linux.hostname.placeholder.it kernel: docker0: port 5(vethc153a0c) entered disabled state
Dec 12 12:00:47 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:47.819093578+01:00" level=warning msg="014a76bce64d20765a5bf2dc5b32fdb990e53a80a2fe3ea26343d88a62d41321 cleanup: failed to unmount IPC: umount /var/lib/docker/containers/014a76bce64d20765a5bf2dc5b32fdb990e53a80a2fe3ea26343d88a62d41321/mounts/shm, flags: 0x2: no such file or directory"
Dec 12 12:00:47 linux.hostname.placeholder.it nomad[1733]: 2019-12-12T12:00:47.907+0100 [INFO ] client.driver_mgr.docker: created container: driver=docker container_id=44c9cef26857d695932d6d66ea218ea2a8c081732b5b3305fea7e540a65c2331
Dec 12 12:00:47 linux.hostname.placeholder.it kernel: docker0: port 5(veth70dc187) entered blocking state
Dec 12 12:00:47 linux.hostname.placeholder.it kernel: docker0: port 5(veth70dc187) entered disabled state
Dec 12 12:00:47 linux.hostname.placeholder.it kernel: device veth70dc187 entered promiscuous mode
Dec 12 12:00:47 linux.hostname.placeholder.it kernel: IPv6: ADDRCONF(NETDEV_UP): veth70dc187: link is not ready
Dec 12 12:00:47 linux.hostname.placeholder.it kernel: docker0: port 5(veth70dc187) entered blocking state
Dec 12 12:00:47 linux.hostname.placeholder.it kernel: docker0: port 5(veth70dc187) entered forwarding state
Dec 12 12:00:47 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:47.939283148+01:00" level=error msg="014a76bce64d20765a5bf2dc5b32fdb990e53a80a2fe3ea26343d88a62d41321 cleanup: failed to delete container from containerd: no such container"
Dec 12 12:00:47 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:47.939366568+01:00" level=error msg="Handler for POST /containers/014a76bce64d20765a5bf2dc5b32fdb990e53a80a2fe3ea26343d88a62d41321/start returned error: OCI runtime create failed: container_linux.go:346: starting container process caused \"process_linux.go:297: applying cgroup configuration for process caused \\\"mkdir /sys/fs/cgroup/memory/docker/014a76bce64d20765a5bf2dc5b32fdb990e53a80a2fe3ea26343d88a62d41321: cannot allocate memory\\\"\": unknown"
Dec 12 12:00:47 linux.hostname.placeholder.it containerd[1753]: time="2019-12-12T12:00:47.993997045+01:00" level=info msg="shim containerd-shim started" address="/containerd-shim/moby/44c9cef26857d695932d6d66ea218ea2a8c081732b5b3305fea7e540a65c2331/shim.sock" debug=false pid=106883
Dec 12 12:00:48 linux.hostname.placeholder.it containerd[1753]: time="2019-12-12T12:00:48.095195175+01:00" level=info msg="shim reaped" id=44c9cef26857d695932d6d66ea218ea2a8c081732b5b3305fea7e540a65c2331
Dec 12 12:00:48 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:48.105262650+01:00" level=error msg="stream copy error: reading from a closed fifo"
Dec 12 12:00:48 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:48.105305452+01:00" level=error msg="stream copy error: reading from a closed fifo"
Dec 12 12:00:48 linux.hostname.placeholder.it kernel: docker0: port 5(veth70dc187) entered disabled state
Dec 12 12:00:48 linux.hostname.placeholder.it kernel: docker0: port 6(veth74cd792) entered blocking state
Dec 12 12:00:48 linux.hostname.placeholder.it kernel: docker0: port 6(veth74cd792) entered disabled state
Dec 12 12:00:48 linux.hostname.placeholder.it kernel: device veth74cd792 entered promiscuous mode
Dec 12 12:00:48 linux.hostname.placeholder.it kernel: IPv6: ADDRCONF(NETDEV_UP): veth74cd792: link is not ready
Dec 12 12:00:48 linux.hostname.placeholder.it kernel: docker0: port 6(veth74cd792) entered blocking state
Dec 12 12:00:48 linux.hostname.placeholder.it kernel: docker0: port 6(veth74cd792) entered forwarding state
Dec 12 12:00:48 linux.hostname.placeholder.it kernel: docker0: port 5(veth70dc187) entered disabled state
Dec 12 12:00:48 linux.hostname.placeholder.it kernel: device veth70dc187 left promiscuous mode
Dec 12 12:00:48 linux.hostname.placeholder.it kernel: docker0: port 5(veth70dc187) entered disabled state
Dec 12 12:00:48 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:48.211845631+01:00" level=warning msg="44c9cef26857d695932d6d66ea218ea2a8c081732b5b3305fea7e540a65c2331 cleanup: failed to unmount IPC: umount /var/lib/docker/containers/44c9cef26857d695932d6d66ea218ea2a8c081732b5b3305fea7e540a65c2331/mounts/shm, flags: 0x2: no such file or directory"
Dec 12 12:00:48 linux.hostname.placeholder.it containerd[1753]: time="2019-12-12T12:00:48.247687889+01:00" level=info msg="shim containerd-shim started" address="/containerd-shim/moby/014a76bce64d20765a5bf2dc5b32fdb990e53a80a2fe3ea26343d88a62d41321/shim.sock" debug=false pid=106961
Dec 12 12:00:48 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:48.301734493+01:00" level=error msg="44c9cef26857d695932d6d66ea218ea2a8c081732b5b3305fea7e540a65c2331 cleanup: failed to delete container from containerd: no such container"
Dec 12 12:00:48 linux.hostname.placeholder.it dockerd[1869]: time="2019-12-12T12:00:48.301789037+01:00" level=error msg="Handler for POST /containers/44c9cef26857d695932d6d66ea218ea2a8c081732b5b3305fea7e540a65c2331/start returned error: OCI runtime create failed: container_linux.go:346: starting container process caused \"process_linux.go:297: applying cgroup configuration for process caused \\\"mkdir /sys/fs/cgroup/memory/docker/44c9cef26857d695932d6d66ea218ea2a8c081732b5b3305fea7e540a65c2331: cannot allocate memory\\\"\": unknown"
jpmenil commented 4 years ago

Must be fixed with kernel kernel-3.10.0-1075.el7

hrnjan commented 4 years ago

Same issue here: CentOS Linux release 7.7.1908 Kernel 3.10.0-1062.9.1.el7.x86_64 Docker version 19.03.5, build 633a0ea 130+ containers (pods in k8s)

To resolve this issue we are going to replace the kernel with kernel-lt 4.4.206 from elrepo. We are still using iptables, so first we will need to reconfigure our hosts for nftables usage. Let us know if you find some kind of workaround for this issue.

JakeBonek commented 4 years ago

Same issue here: CentOS Linux release 7.7.1908 Kernel 3.10.0-1062.9.1.el7.x86_64 Docker version 19.03.5, build 633a0ea 130+ containers (pods in k8s)

To resolve this issue we are going to replace the kernel with kernel-lt 4.4.206 from elrepo. We are still using iptables, so first we will need to reconfigure our hosts for nftables usage. Let us know if you find some kind of workaround for this issue.

Just so you know, we've tried with various 4.x kernels as well and had the same issue.

hrnjan commented 4 years ago

Same issue here: CentOS Linux release 7.7.1908 Kernel 3.10.0-1062.9.1.el7.x86_64 Docker version 19.03.5, build 633a0ea 130+ containers (pods in k8s) To resolve this issue we are going to replace the kernel with kernel-lt 4.4.206 from elrepo. We are still using iptables, so first we will need to reconfigure our hosts for nftables usage. Let us know if you find some kind of workaround for this issue.

Just so you know, we've tried with various 4.x kernels as well and had the same issue.

Can you list affected 4.x kernels please? Thank you! We need to fix this so finding the 'right' kernel is the only way as I can see.

jpmenil commented 4 years ago

It took me around a week to trigger the issue until i reboot the host. If anyone can trig this issue faster than me, possible to test with the following kernel parameter: 'cgroup.memory=nokmem'

kanthasamyraja commented 4 years ago

It took me around a week to trigger the issue until i reboot the host. If anyone can trig this issue faster than me, possible to test with the following kernel parameter: 'cgroup.memory=nokmem'

I am also facing this issue with mentioned docker(19.03.5) and kernel(kernel-3.10.0-1062) version on RHEL 7.7

Could you also provide where should I add this parameter?

jpmenil commented 4 years ago

@kanthasamyraja edit etc/default/grub then update the grub config

TBBle commented 4 years ago

@kanthasamyraja: Note that the fix for https://bugzilla.redhat.com/show_bug.cgi?id=1507149 is not in kernel-3.10.0-1062, it's in kernel-3.10.0-1062.4.1 or later. If you're on CentOS 7, the required kernel is in the CentOS Updates repository, not the CentOS Base repository, which should be enabled by default.

Per https://bugzilla.redhat.com/show_bug.cgi?id=1507149#c131 there is possibly a different bug that affects later kernels as well, which is what this ticket was reopened for by @jpmenil .

So if your kernel version was accurate, you should first upgrade to kernel-3.10.0-1062.4.1 to rule out https://bugzilla.redhat.com/show_bug.cgi?id=1507149.

Or you can distinguish them as when the newer issue hits,

meminfo data doesn't suggest a bloated slab usage, but a bloated page-cache usage instead.

kanthasamyraja commented 4 years ago

@kanthasamyraja: Note that the fix for https://bugzilla.redhat.com/show_bug.cgi?id=1507149 is not in kernel-3.10.0-1062, it's in kernel-3.10.0-1062.4.1 or later. If you're on CentOS 7, the required kernel is in the CentOS Updates repository, not the CentOS Base repository, which should be enabled by default.

Per https://bugzilla.redhat.com/show_bug.cgi?id=1507149#c131 there is possibly a different bug that affects later kernels as well, which is what this ticket was reopened for by @jpmenil .

So if your kernel version was accurate, you should first upgrade to kernel-3.10.0-1062.4.1 to rule out https://bugzilla.redhat.com/show_bug.cgi?id=1507149.

Or you can distinguish them as when the newer issue hits,

meminfo data doesn't suggest a bloated slab usage, but a bloated page-cache usage instead.

It is working now for me. I am using below version. (RHEL7.7.)

$ sudo rpm -qa | grep kernel-3.10.0-1062 kernel-3.10.0-1062.9.1.el7.x86_64 kernel-3.10.0-1062.4.3.el7.x86_64 kernel-3.10.0-1062.7.1.el7.x86_64 $

Thanks for the information.

jpmenil commented 4 years ago

@thaJeztah, i think we can close (again) this one, since adding the cgroup.memory=nokmem kernel parameter do the trick.

bamb00 commented 4 years ago

@jpmenil I'm running RH 7.6 3.10.0-957.1.3.el7.x86_6 and just want to be sure on applying the fix.

1 - Set the kernel parameter (cgroup.memory=nokmem) in /etc/default/grub 2 - Upgrade to kernel-3.10.0-1062.4.1.el7.x86_64 or higher 3 - I'm running docker version 18.06.1-ce. Do I need to upgrade docker?

Any additional steps not listed above?

Thanks in Advanced.

cofyc commented 4 years ago

hi, if you leaked too much memory cgroups, new memory cgroup cannot be created and will fail with "Cannot allocate memory". You can check if there are some empty cgroups in /sys/fs/cgroup/memory.

jpmenil commented 4 years ago

@bamb00 only the kernel parameter is needed. no need to upgrade docker.

cofyc commented 4 years ago

@jpmenil Thanks! verified that it works when cgroup.nokmem is configured.

In https://bugzilla.redhat.com/show_bug.cgi?id=1507149, they mentioned that the issue has been fixed in kernel-3.10.0-1075.el7. Did anyone verify it?

mayconritzmann commented 4 years ago

Hello, today I had the same problem in the production of my environment.

My kernel was kernel-3.10.0-1062.9.1, after upgrading to kernel-3.10.0-1062.12.1, all containers started

Does anyone have any other alternative? This problem node is part of a k8s cluster.

hrnjan commented 4 years ago

Hello, today I had the same problem in the production of my environment.

My kernel was kernel-3.10.0-1062.9.1, after upgrading to kernel-3.10.0-1062.12.1, all containers started

Does anyone have any other alternative? This problem node is part of a k8s cluster.

As mentioned above. Fix is straightforward: Set the kernel parameter to cgroup.memory=nokmem or finally, you can fix the problem by upgrading your kernel.

vinayus commented 4 years ago

Not sure if it helps someone. I ended up with same issue in a Centos 7 box. I was on docker-19.03.6 and kernel 3.10.0-1062.12.1.el7.x86_64. All my k8s nodes were unstable and causing a mess.

I found out that previously I had done yum update(across nodes while upgrading k8s) which updated my containerd version to 1.2.12.

So i removed docker and containerd, and reinstalled it with defaults(19.03.6 and 1.2.10). Things looks stable here.

All posts related to kmem and kernel flags didn't help me though.

mayconritzmann commented 4 years ago

I solved the problem by adding cgroup.memory = nokmem inside / etc / default / grub on the GRUB_CMDLINE_LINUX line.

After this:

grub2-mkconfig -o /boot/grub2/grub.cfg

reboot

yazanmonshed commented 4 years ago

this is because of cgroupv2 change on the latest version of Linux kernel

check this is Article here

SuperSandro2000 commented 4 years ago

@YazanALMonshed you posted some link to some random arab anime site.

lordvlad commented 4 years ago

still hitting this error on CentOS Linux 7

$ rpm -qa | grep kernel-3.10.0-1062
kernel-3.10.0-1062.12.1.el7.x86_64
kernel-3.10.0-1062.9.1.el7.x86_64
$ uname -r
3.10.0-1062.12.1.el7.x86_64
$ docker --version
Docker version 19.03.6, build 369ce74a3c

We hit this about a 6 or 8 weeks ago, upgraded the kernel and though it resolved, unfortunately, it cropped up again last night, and we cannot start any new containers.

san360 commented 4 years ago

same here

  Warning  Failed            44m (x3922 over 13d)     kubelet, xxxx-xxxxx  Error: failed to start container "rules-configmap-reloader": Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:297: applying cgroup configuration for process caused \"mkdir /sys/fs/cgroup/memory/kubepods/burstable/pod1b9669e6-7fe2-11ea-85f3-00505608c440/rules-configmap-reloader: cannot allocate memory\"": unknown
  Warning  Failed            39m (x3916 over 13d)     kubelet, xxxx-xxxxx  Error: failed to start container "prometheus-config-reloader": Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:297: applying cgroup configuration for process caused \"mkdir /sys/fs/cgroup/memory/kubepods/burstable/pod1b9669e6-7fe2-11ea-85f3-00505608c440/prometheus-config-reloader: cannot allocate memory\"": unknown
jpmenil commented 4 years ago

As explained earlier, you will need to boot with cgroup.memory=nokmem parameter

masaki-furuta commented 4 years ago

If it's not working , try systemd.unified_cgroup_hierarchy=0 .

artazar commented 4 years ago

Tried the Centos 7.8 version kernel 3.10.0-1127.el7.x86_64 and still got the slab memory leak, looks like the issue is not/partially resolved. The following symptoms show the issue for me:

unable to ensure pod container exists: failed to create container for [kubepods burstable pod45a22b91-5381-4360-a74d-f4e2cd8aa7ac] : mkdir /sys/fs/cgroup/memory/kubepods/burstable/pod45a22b91-5381-4360-a74d-f4e2cd8aa7ac: cannot allocate memory

# ls /sys/kernel/slab | wc -l
117199
Zilvermeeuw commented 4 years ago

I solved the problem by adding cgroup.memory = nokmem inside / etc / default / grub on the GRUB_CMDLINE_LINUX line.

After this:

grub2-mkconfig -o /boot/grub2/grub.cfg

reboot

This does not work on Fedora 31 and 32. Fedora 31, kernel 5.6.15-200.fc31, docker 19.03.8 Fedora 32, kernel 5.6.15-300.fc32, docker 19.03.8

jpmenil commented 4 years ago

@Siddharth-Hari, do you have cgroup.memory=nokmem set to your kernel cmdline?

wwalker commented 4 years ago

I'm seeing this problem. I just rebooted after putting cgroup.memory=nokmem in /etc/default/grub ran grub2-mkconfig -o /boot/grub2/grub.cfg and rebooted.

Veriefied that the cgroup.mem... made it into the kernel:

2020-07-11 15:19:56 - wwalker@plutonium:~ ✓ $ cat /proc/cmdline
BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.7.7-200.fc32.x86_64 root=/dev/mapper/fedora_plutonium-root ro resume=/dev/mapper/fedora_plutonium-swap rd.lvm.lv=fedora_plutonium/root rd.luks.uuid=luks-d12074c3-3fe9-4de3-bbd8-170b1e464092 rd.lvm.lv=fedora_plutonium/swap cgroup.memory=nokmem

Still getting :

2020-07-11 15:23:12 - wwalker@plutonium:~ ✘ $ docker run --name unauthenticated-jupyter-notebook -p 8888:8888 -d jupyter/base-notebook start-notebook.sh --NotebookApp.token=''
c327d94b0f1a8fd5589dd78b4b373407027591aebf0eded3602e3bd1b0fbb37c
docker: Error response from daemon: OCI runtime create failed: this version of runc doesn't work on cgroups v2: unknown.
2020-07-11 15:23:19 - wwalker@plutonium:~ ✘ $
ideepika commented 4 years ago

I'm seeing this problem. I just rebooted after putting cgroup.memory=nokmem in /etc/default/grub ran grub2-mkconfig -o /boot/grub2/grub.cfg and rebooted.

Veriefied that the cgroup.mem... made it into the kernel:

2020-07-11 15:19:56 - wwalker@plutonium:~ ✓ $ cat /proc/cmdline
BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.7.7-200.fc32.x86_64 root=/dev/mapper/fedora_plutonium-root ro resume=/dev/mapper/fedora_plutonium-swap rd.lvm.lv=fedora_plutonium/root rd.luks.uuid=luks-d12074c3-3fe9-4de3-bbd8-170b1e464092 rd.lvm.lv=fedora_plutonium/swap cgroup.memory=nokmem

Still getting :

2020-07-11 15:23:12 - wwalker@plutonium:~ ✘ $ docker run --name unauthenticated-jupyter-notebook -p 8888:8888 -d jupyter/base-notebook start-notebook.sh --NotebookApp.token=''
c327d94b0f1a8fd5589dd78b4b373407027591aebf0eded3602e3bd1b0fbb37c
docker: Error response from daemon: OCI runtime create failed: this version of runc doesn't work on cgroups v2: unknown.
2020-07-11 15:23:19 - wwalker@plutonium:~ ✘ $

+1 and is blocking :(

ACK-lcn commented 4 years ago

Currently, you can use grub in /etc/default/grub, “GRUP_CMDLINE_LINUX” field setting “cgrop.memory=nokmem” This problem can be avoided. Whether this problem can be solved completely is still being tested

This method also has some disadvantages

1) If the node server is restarted, the pod will drift. If the node scale is large, the upgrade operation will be very cumbersome, and the business department will have comments, so we should communicate in advance.

ifelsefi commented 4 years ago

Hi

Anyone know if 3.10.0-1127.19.1.el7 fixes the issue? I am at 3.10.0-1062.el7 so we should update.

mayconritzmann commented 4 years ago

That works for CentOS 7:

I solved the problem by adding cgroup.memory = nokmem inside / etc / default / grub on the GRUB_CMDLINE_LINUX line.

After this:

grub2-mkconfig -o /boot/grub2/grub.cfg

reboot

stemid commented 3 years ago

I had this issue out of the blue on an otherwise idle k8s v18 cluster, with a pretty recent CentOS 7 kernel, did an upgrade to the latest packages, added cgroup.memory=nokmem to boot params with grubby and haven't seen the issue since the reboot.

The upgrade was docker-ce 19.03.12-3 => 19.03.13-3 and kernel 3.10.0-1127.13.1 => 3.10.0-1127.19.1.

llhuii commented 3 years ago

I had this issue with this kernel version:

[root@master debug]# uname -a
Linux master 3.10.0-1127.13.1.el7.x86_64 #1 SMP Tue Jun 23 15:46:38 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
[root@master debug]# lsb_release -a
LSB Version:    :core-4.1-amd64:core-4.1-noarch
Distributor ID: CentOS
Description:    CentOS Linux release 7.8.2003 (Core)
Release:        7.8.2003
Codename:       Core

docker server version 19.03.12

fusionx86 commented 3 years ago

Are you all adding the cgroup.memory kernel parameter to master nodes as well? Seems to only apply to nodes where deployments are scheduled, but for consistency, I'm wondering about the master nodes as well.

GaboFDC commented 3 years ago

On all redhat related distributions, it may also be something related to the enablement of cgroupsv2. see https://www.redhat.com/sysadmin/fedora-31-control-group-v2 and https://www.linuxuprising.com/2019/11/how-to-install-and-use-docker-on-fedora.html

BrianSidebotham commented 3 years ago

I'm here with this error and it's because from Fedora >= 31 has moved to cgroups v2. Using podman with the podman-docker interface works OK, except of course containers need to also support cgroups v2 and CentOS 7 does not. :(

b-rohit commented 3 years ago

I have the same issue on Ubuntu 18.04

  Operating System: Ubuntu 18.04.5 LTS
            Kernel: Linux 4.15.0
      Architecture: x86-64
Client: Docker Engine - Community
 Version:           20.10.2
 API version:       1.40
 Go version:        go1.13.15
 Git commit:        2291f61
 Built:             Mon Dec 28 16:17:32 2020
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          19.03.11
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.13.10
  Git commit:       42e35e61f3
  Built:            Mon Jun  1 09:10:54 2020
dignajar commented 3 years ago

I'm facing the same issue, but I'm not sure if the issue came from cgroup memory.

I tried to create my self cgroups and delete them and works fine, but still have the issue.

Logs from the Kubernetes node

Jan 19 13:15:43 xxxxxx kubelet[9279]: E0119 13:15:43.049088    9279 pod_workers.go:191] Error syncing pod e886905b-acf0-47df-8c5d-b20b07e7a824 ("xxxxxx(e886905b-acf0-47df-8c5d-b20b07e7a824)"), skipping: failed to ensure that the pod: e886905b-acf0-47df-8c5d-b20b07e7a824 cgroups exist and are correctly applied: failed to create container for [kubepods burstable pode886905b-acf0-47df-8c5d-b20b07e7a824] : mkdir /sys/fs/cgroup/memory/kubepods/burstable/pode886905b-acf0-47df-8c5d-b20b07e7a824: cannot allocate memory

Kernel

Centos 7 - 3.10.0-1127.19.1.el7.x86_64

Disabling the memory accounting with the kernel parameter cgroup.memory = nokmem could produce some overflow ?

bcookatpcsd commented 3 years ago

Fedora 33 Server here.. brand new install tonight. I added the kernel parameter with the fedora supplied docker and could not get hello-world to work. https://docs.docker.com/engine/install/fedora/ , removes fedora supplied docker and replaces it.. rebooted and removed the kernel parameter, docker images needed to be rm b/c of overlay.. but after docker rm image; "things seem ok so far" (tm)