distribution / distribution

The toolkit to pack, ship, store, and deliver container content
https://distribution.github.io/distribution
Apache License 2.0
8.82k stars 2.45k forks source link

SLUB: Unable to allocate memory on node -1 #3005

Closed ThomasCassimon closed 5 years ago

ThomasCassimon commented 5 years ago

We also reported this issue to docker/for-linux

Expected behavior

K8s/Docker works without a hitch on Ubuntu 16.04.

Actual behavior

When dockers are running on the server, the following errors are generated by dmesg.

[319003.331580] SLUB: Unable to allocate memory on node -1 (gfp=0x2088020)
[319003.331587]   cache: mnt_cache(9946:ea4c61d01895b46bf04a9b8c54602a4a6fff12ca7341b3b21f879414c120da79), object size: 384, buffer size: 384, default order: 2, min order: 0
[319003.331591]   node 0: slabs: 20, objs: 776, free: 0
[319003.331594]   node 1: slabs: 14, objs: 556, free: 0
[319940.222707] SLUB: Unable to allocate memory on node -1 (gfp=0x2088020)
[319940.222714]   cache: blkdev_ioc(9946:ea4c61d01895b46bf04a9b8c54602a4a6fff12ca7341b3b21f879414c120da79), object size: 104, buffer size: 104, default order: 0, min order: 0
[319940.222718]   node 0: slabs: 2, objs: 78, free: 0
[319940.222721]   node 1: slabs: 4, objs: 156, free: 0
[320001.028578] SLUB: Unable to allocate memory on node -1 (gfp=0x2080020)
[320001.028582]   cache: kmalloc-128(9946:ea4c61d01895b46bf04a9b8c54602a4a6fff12ca7341b3b21f879414c120da79), object size: 128, buffer size: 128, default order: 1, min order: 0
[320001.028585]   node 0: slabs: 19, objs: 1216, free: 0
[320001.028587]   node 1: slabs: 18, objs: 1152, free: 0
[320004.629230] SLUB: Unable to allocate memory on node -1 (gfp=0x2088020)
[320004.629236]   cache: mnt_cache(9946:ea4c61d01895b46bf04a9b8c54602a4a6fff12ca7341b3b21f879414c120da79), object size: 384, buffer size: 384, default order: 2, min order: 0
[320004.629239]   node 0: slabs: 24, objs: 912, free: 0
[320004.629241]   node 1: slabs: 18, objs: 692, free: 0

Eventually, the server crashes (after about 3-4 days since first docker boot) and the last thing that can be seen in the kern.log are the SLUB errors.

Related problems I found: https://pingcap.com/blog/try-to-fix-two-linux-kernel-bugs-while-testing-tidb-operator-in-k8s/ opencontainers/runc#1725 kubernetes/kubernetes#61937 (comment)

However, these issues are related to CentOS and not Ubuntu. Additionally, these issues claim tasks are blocked, which doesn't happen according to our dmesg.

Steps to reproduce the behavior

Deploy a Docker + K8s + Rancher setup

Output of docker version:

root@worker07:~# docker version
Client:
 Version:           18.09.8
 API version:       1.39
 Go version:        go1.10.8
 Git commit:        0dd43dd87f
 Built:             Wed Jul 17 17:41:19 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.8
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.8
  Git commit:       0dd43dd
  Built:            Wed Jul 17 17:07:25 2019
  OS/Arch:          linux/amd64
  Experimental:     false

Output of docker info:

Containers: 177
 Running: 95
 Paused: 0
 Stopped: 82
Images: 61
Server Version: 18.09.8
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 894b81a4b802e4eb2a91d1ce216b8817763c29fb
runc version: 425e105d5a03fabd737a126ad93d62a9eeede87f
init version: fec3683
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.4.0-159-generic
Operating System: Ubuntu 16.04.6 LTS
OSType: linux
Architecture: x86_64
CPUs: 48
Total Memory: 125.8GiB
Name: worker07
ID: MQAU:C6U4:ZKSC:EBB5:PVVE:B64W:BQGK:KSUR:6CYX:K6KV:STGJ:GCS5
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 REDACTED:30002
 127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine

WARNING: No swap limit support

The servers are Dell Poweredges R430. One has been configured as a master, the other as a slave, both have the problem. These are new servers on which a clean 16.04 image was installed.

Any idea on what could be the cause would be greatly appreciated

thaJeztah commented 5 years ago

Oh! This issue tracker is for the open source docker registry, not for the docker engine (which is based on the moby codebase in https://github.com/moby/moby). I see an issue was also opened in https://github.com/docker/for-linux/issues/774, so let me close this one in favor of that one.