docker / for-linux

Docker Engine for Linux
https://docs.docker.com/engine/installation/
757 stars 86 forks source link

Unexpected and unexplained OOM killing of JVM container #653

Open rdzimmer-zz opened 5 years ago

rdzimmer-zz commented 5 years ago

Expected behavior

RHEL kernel OOM kills a docker container when the memory usage reaches the memory limit set by kubernetes and docker.

Actual behavior

I am having a container with a single JVM process OOM killed when I do not expect it to be. My JVM's Xmx is 2GB, and RSS is at no more than 2.2GB as reported by ps on the system and by kubernetes/heapster. The kubernetes/docker memory limit has been set to 3GB to allow space for the native JVM memory usage above the Xmx.
The workingSetBytes and usageBytes, as well as docker stats start at values matching the RSS, but grow over time until they hit the 3GB limit and OOM. The RSS remains at 2.2GB.

Steps to reproduce the behavior

The process is a KairosDB JVM. Additional details may be recorded in the forum post https://groups.google.com/forum/#!topic/kairosdb-group/aVm8M7yYXIA KairosDB does minimal disk activity, as it uses Cassandra as its datastore. However, there may be the occasional write to /tpm/kairos_cache/, which is deleted shortly after being written. What I am seeing is that the usageBytes grows whenever these writes are happening. I am monitoring the io activity of KairosDB with iotop. It's only doing a few MBs of writes every 5 minutes.

Here is the OOM message.

Apr 25 00:33:10 work4 kernel: Task in /kubepods/burstable/pod2e255ab3-624c-11e9-81f6-06c1f8d31d89/f9972a566c2766c89035d32926db36299b54e076271954aa10ee7a21c2debcac killed as a result of limit of /kubepods/burstable/pod2e255ab3-624c-11e9-81f6-06c1f8d31d89/f9972a566c2766c89035d32926db36299b54e076271954aa10ee7a21c2debcac
Apr 25 00:33:10 work4 kernel: memory: usage 3072000kB, limit 3072000kB, failcnt 1651
Apr 25 00:33:10 work4 kernel: memory+swap: usage 3072000kB, limit 3072000kB, failcnt 0
Apr 25 00:33:10 work4 kernel: kmem: usage 854836kB, limit 9007199254740988kB, failcnt 0
Apr 25 00:33:10 work4 kernel: Memory cgroup stats for /kubepods/burstable/pod2e255ab3-624c-11e9-81f6-06c1f8d31d89/f9972a566c2766c89035d32926db36299b54e076271954aa10ee7a21c2debcac: cache:48KB rss:2217116KB rss_huge:960512KB mapped_file:20KB swap:0KB inactive_anon:612264KB active_anon:1604848KB inactive_file:20KB active_file:12KB unevictable:4KB
Apr 25 00:33:10 work4 kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
Apr 25 00:33:10 work4 kernel: [20197]  1000 20197     1160      173       8        0           926 sh
Apr 25 00:33:10 work4 kernel: [20223]  1000 20223     1160      199       8        0           926 start_time_seri
Apr 25 00:33:10 work4 kernel: [20281]  1000 20281     1160      190       7        0           926 delete_rotated_
Apr 25 00:33:10 work4 kernel: [20282]  1000 20282  1722846   557144    1210        0           926 java
Apr 25 00:33:10 work4 kernel: [31726]  1000 31726     1136      103       7        0           926 sleep
Apr 25 00:33:10 work4 kernel: Memory cgroup out of memory: Kill process 21499 (qtp806894278-18) score 1653 or sacrifice child
Apr 25 00:33:10 work4 kernel: Killed process 20282 (java) total-vm:6891384kB, anon-rss:2214752kB, file-rss:13804kB, shmem-rss:0kB

I've started logging the docker stats, ps and cgroup memory.stat each minute. Here are some statistics from near the start of a test run (once everything is up and initialized), and several hours later in the run. For ease of reading I've merged the two results into one output:

docker stats {{.MemUsage}}
2.079GiB / 2.93GiB   ->   2.695GiB / 2.93GiB
ps aux
2144252 (RSS KB)   ->   2226124 (RSS KB)
cat /sys/fs/cgroup/memory/kubepods/burstable/pod2e255ab3-624c-11e9-81f6-06c1f8d31d89/dd630103a0803f7934a14f3d3dc92c64b5442de6bd1aba23371bbc50affe3c1e/memory.stat
cache 434176   ->   3706880
rss 2187423744   ->   2268381184
rss_huge 1029701632   ->   1029701632
mapped_file 40960   ->   40960
swap 0   ->   0
pgpgin 524174   ->   2189562
pgpgout 240930   ->   1885754
pgfault 838775   ->   6379585
pgmajfault 0   ->   0
inactive_anon 0   ->   0
active_anon 2187399168   ->   2268377088
inactive_file 28672   ->   69632
active_file 405504   ->   3629056
unevictable 4096   ->   4096
hierarchical_memory_limit 3145728000   ->   3145728000
hierarchical_memsw_limit 3145728000   ->   3145728000
total_cache 434176   ->   3706880
total_rss 2187423744   ->   2268381184
total_rss_huge 1029701632   ->   1029701632
total_mapped_file 40960   ->   40960
total_swap 0   ->   0
total_pgpgin 524174   ->   2189562
total_pgpgout 240930   ->   1885754
total_pgfault 838775   ->   6379585
total_pgmajfault 0   ->   0
total_inactive_anon 0   ->   0
total_active_anon 2187399168   ->   2268377088
total_inactive_file 28672   ->   69632
total_active_file 405504   ->   3629056
total_unevictable 4096   ->   4096

I probably should have also been logging /sys/fs/cgroup/memory/kubepods/burstable/pod2e255ab3-624c-11e9-81f6-06c1f8d31d89/dd630103a0803f7934a14f3d3dc92c64b5442de6bd1aba23371bbc50affe3c1e/memory.usage_in_bytes, but it appears to be growing with the docker stats {{.MemUsage}}.

cat /sys/fs/cgroup/memory/kubepods/burstable/pod2e255ab3-624c-11e9-81f6-06c1f8d31d89/dd630103a0803f7934a14f3d3dc92c64b5442de6bd1aba23371bbc50affe3c1e/memory.usage_in_bytes
2896216064

It is possible this is either a duplicate or closely related to https://github.com/docker/for-linux/issues/651, since the memory only grows when KairosDB does any disk writes to /tmp/kairos_cache. I am trying to get a better understanding of what memory usage is causing the OOM to know for sure. The total_active_file and cache seems no where near enough to be causing the growth. Even if it was the file cache, the kernel should know to remove it when the space is needed.

Output of docker version:

Docker version 18.03.1-ce, build 9ee9f40

Output of docker info:

Containers: 113
 Running: 54
 Paused: 0
 Stopped: 59
Images: 101
Server Version: 18.03.1-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 773c489c9c1b21a6d78b5c538cd395416ec50f88
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-862.14.4.el7.x86_64
Operating System: Red Hat
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 31.26GiB
Name: REDACTED
ID: S4T6:XKL3:VZEI:2KE4:VXMG:MG3Z:KC6H:OS44:UY2T:XZFI:UBXA:LNFI
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled

Additional environment details (AWS, VirtualBox, physical, etc.)

free -m
              total        used        free      shared  buff/cache   available
Mem:          32007        6269        6040         483       19697       24328
Swap:          2047           2        2045
IBM J9 VM (build 2.9, JRE 1.8.0 Linux amd64-64-Bit Compressed References 20190124_408237 (JIT enabled, AOT enabled)
OpenJ9   - 9c77d86
OMR      - dad8ba7
IBM      - e2996d1)
JCL - 20190207_01 based on Oracle jdk8u201-b09
uname -a
Linux  REDACTED 3.10.0-862.14.4.el7.x86_64 #1 SMP Fri Sep 21 09:07:21 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.5 (Maipo)
rdzimmer-zz commented 5 years ago

Possibly related to https://github.com/kubernetes/kubernetes/issues/61937 https://github.com/kubernetes/kubernetes/issues/43916, and https://github.com/docker/for-linux/issues/651

xigang commented 5 years ago

the same issues.