Open Vesyrak opened 5 years ago
ping @kolyshkin PTAL - could this be a bug in the Ubuntu kernel as well?
We've encountered the same problem on Ubuntu 16.04.6 LTS:
$ uname -a
Linux my-hostname 4.4.0-1096-aws #107-Ubuntu SMP Thu Oct 3 01:51:58 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
$ docker version
Client: Docker Engine - Community
Version: 19.03.4
API version: 1.40
Go version: go1.12.10
Git commit: 9013bf583a
Built: Fri Oct 18 15:53:51 2019
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 19.03.4
API version: 1.40 (minimum version 1.12)
Go version: go1.12.10
Git commit: 9013bf583a
Built: Fri Oct 18 15:52:23 2019
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.2.10
GitCommit: b34a5c8af56e510852c35414db4c1f4fa6172339
runc:
Version: 1.0.0-rc8+dev
GitCommit: 3e425f80a8c931f88e6d94a8c831b9d5aa481657
docker-init:
Version: 0.18.0
GitCommit: fec3683
$ docker info
Client:
Debug Mode: false
Server:
Containers: 2
Running: 2
Paused: 0
Stopped: 0
Images: 2
Server Version: 19.03.4
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 32
Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: b34a5c8af56e510852c35414db4c1f4fa6172339
runc version: 3e425f80a8c931f88e6d94a8c831b9d5aa481657
init version: fec3683
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.4.0-1096-aws
Operating System: Ubuntu 16.04.6 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 7.453GiB
Name: my-hostname
ID: IE6A:7VYA:YNOZ:SFG2:5KSQ:AMVB:2OMV:FXII:XS2D:XR2J:63P5:TTZA
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
provider=amazonec2
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: No swap limit support
WARNING: the aufs storage-driver is deprecated, and will be removed in a future release.
$ dpkg -l docker-ce
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-==============================================================-====================================-====================================-=================================================================================================================================
ii docker-ce 5:19.03.4~3-0~ubuntu-xenial amd64 Docker: the open-source application container engine
Oct 31 03:19:56 my-hostname dockerd[1221]: time="2019-10-31T03:19:56.095504904Z" level=warning msg="failed to retrieve runc version: exit status 2"
Oct 31 03:19:56 my-hostname kernel: [222542.473969] SLUB: Unable to allocate memory on node 0 (gfp=0x2088020)
Oct 31 03:19:56 my-hostname kernel: [222542.473973] cache: blkdev_ioc(1639:9e1903e879a86489ae8491e5ed5dd7192461efc7e7fc3ea529d833230ca4d87f), object size: 104, buffer size: 104, default order: 0, min order: 0
Oct 31 03:19:56 my-hostname kernel: [222542.473975] node 0: slabs: 4, objs: 156, free: 0
Oct 31 03:19:56 my-hostname kernel: [222542.484236] SLUB: Unable to allocate memory on node 0 (gfp=0x2088020)
Oct 31 03:19:56 my-hostname kernel: [222542.484239] cache: blkdev_ioc(1639:9e1903e879a86489ae8491e5ed5dd7192461efc7e7fc3ea529d833230ca4d87f), object size: 104, buffer size: 104, default order: 0, min order: 0
Oct 31 03:19:56 my-hostname kernel: [222542.484241] node 0: slabs: 4, objs: 156, free: 0
Also having this issue with
Docker version 19.03.8, build afacb8b7f0
chris@study:~$ docker version
Client: Docker Engine - Community
Version: 19.03.8
API version: 1.40
Go version: go1.12.17
Git commit: afacb8b7f0
Built: Wed Mar 11 01:25:58 2020
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 19.03.8
API version: 1.40 (minimum version 1.12)
Go version: go1.12.17
Git commit: afacb8b7f0
Built: Wed Mar 11 01:24:30 2020
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.2.13
GitCommit: 7ad184331fa3e55e52b890ea95e65ba581ae3429
runc:
Version: 1.0.0-rc10
GitCommit: dc9208a3303feef5b3839f4323d9beb36df0a9dd
docker-init:
Version: 0.18.0
GitCommit: fec3683
on Ubuntu 16.04.6 LTS
: Linux version 4.4.0-176-generic
Guess I need to try and upgrade Ubuntu?
As I see not much has happened on this issue...
This seems to be related to kernel version 4.4.X, from what I've been able to google try upgrading the kernel, perhaps
Have the similar issue
er crashes (after abo
Hi, @Vesyrak may I know what's kind of crashes?
We have the similar problem:
SLUB: Unable to allocate memory on node -1
errors on dmesg, nearly 754 such error in 24 hours.oom-killer
on some specified process(restart after killer), nearly 70 times in 24 hours.EXT4-fs error (device dm-0) in ext4_truncate:3932: Out of memory
happen, then Aborting journal on device dm-0-8.
EXT4-fs (dm-0): Remounting filesystem read-only
.Above problem happens on the same server during past 3 months, we have to reboot the server then fix the file system to recover it.
Attachd the dmesg log.
Belowing is the setup info:
Client:
Version: 18.03.1-ce
API version: 1.37
Go version: go1.9.5
Git commit: 9ee9f40
Built: Thu Apr 26 07:17:20 2018
OS/Arch: linux/amd64
Experimental: false
Orchestrator: swarm
Server:
Engine:
Version: 18.03.1-ce
API version: 1.37 (minimum version 1.12)
Go version: go1.9.5
Git commit: 9ee9f40
Built: Thu Apr 26 07:15:30 2018
OS/Arch: linux/amd64
Experimental: false
docker info
Containers: 589
Running: 531
Paused: 0
Stopped: 58
Images: 773
Server Version: 18.03.1-ce
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 773c489c9c1b21a6d78b5c538cd395416ec50f88
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.4.0-131-generic
Operating System: Ubuntu 16.04.5 LTS
OSType: linux
Architecture: x86_64
CPUs: 80
Total Memory: 125.3GiB
Name: supOS
ID: Y6P6:4PJV:U32M:QXUE:75ZN:6A2K:XASZ:JOBJ:QXNN:TW5B:EVZZ:757V
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
registry.supos.ai
registry:5000
192.168.20.20:5000
127.0.0.0/8
Live Restore Enabled: false
WARNING: No swap limit support
uname -a
Linux supOS 4.4.0-131-generic #157-Ubuntu SMP Thu Jul 12 15:51:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Expected behavior
K8s/Docker works without a hitch on Ubuntu 16.04.
Actual behavior
When dockers are running on the server, the following errors are generated by dmesg.
Eventually, the server crashes (after about 3-4 days since first docker boot) and the last thing that can be seen in the kern.log are the SLUB errors.
Related problems I found: https://pingcap.com/blog/try-to-fix-two-linux-kernel-bugs-while-testing-tidb-operator-in-k8s/ https://github.com/opencontainers/runc/issues/1725 https://github.com/kubernetes/kubernetes/issues/61937#issuecomment-417265738
However, these issues are related to CentOS and not Ubuntu. Additionally, these issues claim tasks are blocked, which doesn't happen according to our dmesg.
Steps to reproduce the behavior
Deploy a Docker + K8s + Rancher setup
Output of
docker version
:Output of
docker info
:The servers are Dell Poweredges R430. One has been configured as a master, the other as a slave, both have the problem. These are new servers on which a clean 16.04 image was installed.
Any idea on what could be the cause would be greatly appreciated