docker / for-linux

Docker Engine for Linux
753 stars 85 forks source link

SLUB: Unable to allocate memory on node -1 #774

Open Vesyrak opened 5 years ago

Vesyrak commented 5 years ago

Expected behavior

K8s/Docker works without a hitch on Ubuntu 16.04.

Actual behavior

When dockers are running on the server, the following errors are generated by dmesg.

[319003.331580] SLUB: Unable to allocate memory on node -1 (gfp=0x2088020)
[319003.331587]   cache: mnt_cache(9946:ea4c61d01895b46bf04a9b8c54602a4a6fff12ca7341b3b21f879414c120da79), object size: 384, buffer size: 384, default order: 2, min order: 0
[319003.331591]   node 0: slabs: 20, objs: 776, free: 0
[319003.331594]   node 1: slabs: 14, objs: 556, free: 0
[319940.222707] SLUB: Unable to allocate memory on node -1 (gfp=0x2088020)
[319940.222714]   cache: blkdev_ioc(9946:ea4c61d01895b46bf04a9b8c54602a4a6fff12ca7341b3b21f879414c120da79), object size: 104, buffer size: 104, default order: 0, min order: 0
[319940.222718]   node 0: slabs: 2, objs: 78, free: 0
[319940.222721]   node 1: slabs: 4, objs: 156, free: 0
[320001.028578] SLUB: Unable to allocate memory on node -1 (gfp=0x2080020)
[320001.028582]   cache: kmalloc-128(9946:ea4c61d01895b46bf04a9b8c54602a4a6fff12ca7341b3b21f879414c120da79), object size: 128, buffer size: 128, default order: 1, min order: 0
[320001.028585]   node 0: slabs: 19, objs: 1216, free: 0
[320001.028587]   node 1: slabs: 18, objs: 1152, free: 0
[320004.629230] SLUB: Unable to allocate memory on node -1 (gfp=0x2088020)
[320004.629236]   cache: mnt_cache(9946:ea4c61d01895b46bf04a9b8c54602a4a6fff12ca7341b3b21f879414c120da79), object size: 384, buffer size: 384, default order: 2, min order: 0
[320004.629239]   node 0: slabs: 24, objs: 912, free: 0
[320004.629241]   node 1: slabs: 18, objs: 692, free: 0

Eventually, the server crashes (after about 3-4 days since first docker boot) and the last thing that can be seen in the kern.log are the SLUB errors.

Related problems I found:

However, these issues are related to CentOS and not Ubuntu. Additionally, these issues claim tasks are blocked, which doesn't happen according to our dmesg.

Steps to reproduce the behavior

Deploy a Docker + K8s + Rancher setup

Output of docker version:

root@worker07:~# docker version
 Version:           18.09.8
 API version:       1.39
 Go version:        go1.10.8
 Git commit:        0dd43dd87f
 Built:             Wed Jul 17 17:41:19 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
  Version:          18.09.8
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.8
  Git commit:       0dd43dd
  Built:            Wed Jul 17 17:07:25 2019
  OS/Arch:          linux/amd64
  Experimental:     false

Output of docker info:

Containers: 177
 Running: 95
 Paused: 0
 Stopped: 82
Images: 61
Server Version: 18.09.8
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 894b81a4b802e4eb2a91d1ce216b8817763c29fb
runc version: 425e105d5a03fabd737a126ad93d62a9eeede87f
init version: fec3683
Security Options:
  Profile: default
Kernel Version: 4.4.0-159-generic
Operating System: Ubuntu 16.04.6 LTS
OSType: linux
Architecture: x86_64
CPUs: 48
Total Memory: 125.8GiB
Name: worker07
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Experimental: false
Insecure Registries:
Live Restore Enabled: false
Product License: Community Engine

WARNING: No swap limit support

The servers are Dell Poweredges R430. One has been configured as a master, the other as a slave, both have the problem. These are new servers on which a clean 16.04 image was installed.

Any idea on what could be the cause would be greatly appreciated

thaJeztah commented 5 years ago

ping @kolyshkin PTAL - could this be a bug in the Ubuntu kernel as well?

joschi commented 4 years ago

We've encountered the same problem on Ubuntu 16.04.6 LTS:

$ uname -a
Linux my-hostname 4.4.0-1096-aws #107-Ubuntu SMP Thu Oct 3 01:51:58 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
$ docker version
Client: Docker Engine - Community
 Version:           19.03.4
 API version:       1.40
 Go version:        go1.12.10
 Git commit:        9013bf583a
 Built:             Fri Oct 18 15:53:51 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
  Version:          19.03.4
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.10
  Git commit:       9013bf583a
  Built:            Fri Oct 18 15:52:23 2019
  OS/Arch:          linux/amd64
  Experimental:     false
  Version:          1.2.10
  GitCommit:        b34a5c8af56e510852c35414db4c1f4fa6172339
  Version:          1.0.0-rc8+dev
  GitCommit:        3e425f80a8c931f88e6d94a8c831b9d5aa481657
  Version:          0.18.0
  GitCommit:        fec3683
$ docker info
 Debug Mode: false

 Containers: 2
  Running: 2
  Paused: 0
  Stopped: 0
 Images: 2
 Server Version: 19.03.4
 Storage Driver: aufs
  Root Dir: /var/lib/docker/aufs
  Backing Filesystem: extfs
  Dirs: 32
  Dirperm1 Supported: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: b34a5c8af56e510852c35414db4c1f4fa6172339
 runc version: 3e425f80a8c931f88e6d94a8c831b9d5aa481657
 init version: fec3683
 Security Options:
   Profile: default
 Kernel Version: 4.4.0-1096-aws
 Operating System: Ubuntu 16.04.6 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 7.453GiB
 Name: my-hostname
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
 Live Restore Enabled: false

WARNING: No swap limit support
WARNING: the aufs storage-driver is deprecated, and will be removed in a future release.
$ dpkg -l docker-ce
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                                                           Version                              Architecture                         Description
ii  docker-ce                                                      5:19.03.4~3-0~ubuntu-xenial          amd64                                Docker: the open-source application container engine
Oct 31 03:19:56 my-hostname dockerd[1221]: time="2019-10-31T03:19:56.095504904Z" level=warning msg="failed to retrieve runc version: exit status 2"
Oct 31 03:19:56 my-hostname kernel: [222542.473969] SLUB: Unable to allocate memory on node 0 (gfp=0x2088020)
Oct 31 03:19:56 my-hostname kernel: [222542.473973]   cache: blkdev_ioc(1639:9e1903e879a86489ae8491e5ed5dd7192461efc7e7fc3ea529d833230ca4d87f), object size: 104, buffer size: 104, default order: 0, min order: 0
Oct 31 03:19:56 my-hostname kernel: [222542.473975]   node 0: slabs: 4, objs: 156, free: 0
Oct 31 03:19:56 my-hostname kernel: [222542.484236] SLUB: Unable to allocate memory on node 0 (gfp=0x2088020)
Oct 31 03:19:56 my-hostname kernel: [222542.484239]   cache: blkdev_ioc(1639:9e1903e879a86489ae8491e5ed5dd7192461efc7e7fc3ea529d833230ca4d87f), object size: 104, buffer size: 104, default order: 0, min order: 0
Oct 31 03:19:56 my-hostname kernel: [222542.484241]   node 0: slabs: 4, objs: 156, free: 0
LinuxLover9 commented 4 years ago

Also having this issue with

Docker version 19.03.8, build afacb8b7f0
chris@study:~$ docker version
Client: Docker Engine - Community
 Version:           19.03.8
 API version:       1.40
 Go version:        go1.12.17
 Git commit:        afacb8b7f0
 Built:             Wed Mar 11 01:25:58 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
  Version:          19.03.8
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.17
  Git commit:       afacb8b7f0
  Built:            Wed Mar 11 01:24:30 2020
  OS/Arch:          linux/amd64
  Experimental:     false
  Version:          1.2.13
  GitCommit:        7ad184331fa3e55e52b890ea95e65ba581ae3429
  Version:          1.0.0-rc10
  GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
  Version:          0.18.0
  GitCommit:        fec3683

on Ubuntu 16.04.6 LTS : Linux version 4.4.0-176-generic
Guess I need to try and upgrade Ubuntu? As I see not much has happened on this issue...

sorenmat commented 4 years ago

This seems to be related to kernel version 4.4.X, from what I've been able to google try upgrading the kernel, perhaps

Wood-Xia commented 2 years ago

Have the similar issue

er crashes (after abo

Hi, @Vesyrak may I know what's kind of crashes?

We have the similar problem:

  1. quite a lot of SLUB: Unable to allocate memory on node -1 errors on dmesg, nearly 754 such error in 24 hours.
  2. also many oom-killer on some specified process(restart after killer), nearly 70 times in 24 hours.
  3. observed EXT4-fs error (device dm-0) in ext4_truncate:3932: Out of memory happen, then Aborting journal on device dm-0-8.
  4. finally, file system goes to read-only with message EXT4-fs (dm-0): Remounting filesystem read-only.

Above problem happens on the same server during past 3 months, we have to reboot the server then fix the file system to recover it.

Attachd the dmesg log.

Belowing is the setup info:

 Version:      18.03.1-ce
 API version:  1.37
 Go version:   go1.9.5
 Git commit:   9ee9f40
 Built:        Thu Apr 26 07:17:20 2018
 OS/Arch:      linux/amd64
 Experimental: false
 Orchestrator: swarm

  Version:      18.03.1-ce
  API version:  1.37 (minimum version 1.12)
  Go version:   go1.9.5
  Git commit:   9ee9f40
  Built:        Thu Apr 26 07:15:30 2018
  OS/Arch:      linux/amd64
  Experimental: false

docker info
Containers: 589
 Running: 531
 Paused: 0
 Stopped: 58
Images: 773
Server Version: 18.03.1-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 773c489c9c1b21a6d78b5c538cd395416ec50f88
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Security Options:
  Profile: default
Kernel Version: 4.4.0-131-generic
Operating System: Ubuntu 16.04.5 LTS
OSType: linux
Architecture: x86_64
CPUs: 80
Total Memory: 125.3GiB
Name: supOS
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Experimental: false
Insecure Registries:
Live Restore Enabled: false

WARNING: No swap limit support

uname -a
Linux supOS 4.4.0-131-generic #157-Ubuntu SMP Thu Jul 12 15:51:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux