akvo / akvo-reporting

Akvo reporting integration
2 stars 1 forks source link

reporting.test runs out of memory #36

Open stellanl opened 9 years ago

stellanl commented 9 years ago

Every few days the test environment server running ReportServer runs out of memory and (I assume) kills off a sacrificial process. This tends to be the tomcat7 process as it is quite large. We need to determine RS has a memory leak or if it just needs more RAM.

dmesg output:

[1305873.755412] java invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0 [1305873.755415] java cpuset=/ mems_allowed=0 [1305873.755418] Pid: 23724, comm: java Not tainted 3.2.0-79-generic #115-Ubuntu [1305873.755419] Call Trace: [1305873.755426] [] dump_header+0x91/0xe0 [1305873.755429] [] oom_kill_process+0x85/0xb0 [1305873.755431] [] out_of_memory+0xfa/0x220 [1305873.755433] [] alloc_pages_nodemask+0x8dc/0x8f0 [1305873.755438] [] ? noalloc_get_block_write+0x30/0x30 [1305873.755441] [] alloc_pages_current+0xb6/0x120 [1305873.755444] [] page_cache_alloc+0xb7/0xd0 [1305873.755447] [] filemap_fault+0x234/0x3e0 [1305873.755450] [] ? xen_set_pte_at+0x39/0x210 [1305873.755453] [] do_fault+0x72/0x550 [1305873.755458] [] ? error_exit+0x2a/0x60 [1305873.755461] [] ? retint_restore_args+0x5/0x6 [1305873.755463] [] ? pte_mfn_to_pfn+0x4d/0x110 [1305873.755465] [] handle_pte_fault+0xfa/0x200 [1305873.755467] [] ? xen_pmd_val+0xe/0x10 [1305873.755470] [] ? raw_callee_save_xen_pmd_val+0x11/0x1e [1305873.755472] [] handle_mm_fault+0x269/0x370 [1305873.755475] [] do_page_fault+0x17e/0x540 [1305873.755479] [] ? hrtimer_start_range_ns+0x14/0x20 [1305873.755482] [] ? do_futex+0xd8/0x1b0 [1305873.755484] [] ? xen_clocksource_read+0x20/0x30 [1305873.755486] [] ? sys_futex+0x147/0x1a0 [1305873.755489] [] ? ktime_get_ts+0xad/0xe0 [1305873.755492] [] page_fault+0x25/0x30 [1305873.755493] Mem-Info: [1305873.755494] Node 0 DMA per-cpu: [1305873.755496] CPU 0: hi: 0, btch: 1 usd: 0 [1305873.755497] Node 0 DMA32 per-cpu: [1305873.755499] CPU 0: hi: 186, btch: 31 usd: 40 [1305873.755501] active_anon:170375 inactive_anon:57141 isolated_anon:32 [1305873.755502] active_file:10 inactive_file:16 isolated_file:0 [1305873.755502] unevictable:8 dirty:4 writeback:0 unstable:0 [1305873.755503] free:3431 slab_reclaimable:2393 slab_unreclaimable:2421 [1305873.755504] mapped:17 shmem:10 pagetables:2564 bounce:0 [1305873.755505] Node 0 DMA free:8064kB min:32kB low:40kB high:48kB active_anon:1756kB inactive_anon:1876kB active_file:0kB inactive_file:12kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:11708kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB slab_reclaimable:136kB slab_unreclaimable:0kB kernel_stack:24kB pagetables:4kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [1305873.755511] lowmem_reserve[]: 0 2008 2008 2008 [1305873.755514] Node 0 DMA32 free:5660kB min:5716kB low:7144kB high:8572kB active_anon:679744kB inactive_anon:226688kB active_file:40kB inactive_file:52kB unevictable:32kB isolated(anon):128kB isolated(file):0kB present:2056320kB mlocked:32kB dirty:16kB writeback:0kB mapped:64kB shmem:40kB slab_reclaimable:9436kB slab_unreclaimable:9684kB kernel_stack:2064kB pagetables:10252kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:148 all_unreclaimable? yes [1305873.755520] lowmem_reserve[]: 0 0 0 0 [1305873.755523] Node 0 DMA: 2_4kB 9_8kB 5_16kB 3_32kB 2_64kB 2_128kB 3_256kB 1_512kB 2_1024kB 2_2048kB 0_4096kB = 8064kB [1305873.755529] Node 0 DMA32: 497_4kB 7_8kB 0_16kB 3_32kB 3_64kB 0_128kB 1_256kB 2_512kB 0_1024kB 1_2048kB 0_4096kB = 5660kB [1305873.755536] 323 total pagecache pages [1305873.755537] 297 pages in swap cache [1305873.755538] Swap cache stats: add 1995533, delete 1995236, find 1616349/1654999 [1305873.755540] Free swap = 0kB [1305873.755541] Total swap = 2097148kB [1305873.758684] 526320 pages RAM [1305873.758686] 277631 pages reserved [1305873.758687] 804 pages shared [1305873.758688] 244416 pages non-shared [1305873.758689] [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name [1305873.758696] [ 342] 0 342 12509 30 0 -17 -1000 sshd [1305873.758698] [ 347] 102 347 5988 64 0 0 0 dbus-daemon [1305873.758700] [ 372] 101 372 62464 809 0 0 0 rsyslogd [1305873.758702] [ 471] 0 471 3797 1 0 0 0 upstart-socket- [1305873.758704] [ 695] 0 695 1082 1 0 0 0 acpid [1305873.758706] [ 696] 0 696 4778 22 0 0 0 cron [1305873.758709] [ 698] 0 698 4227 5 0 0 0 atd [1305873.758711] [ 1205] 0 1205 14954 922 0 0 0 supervisord [1305873.758713] [ 1230] 1017 1230 243375 11667 0 0 0 statsd /opt/sta [1305873.758716] [ 1387] 999 1387 3300 8 0 0 0 n2txd [1305873.758718] [ 1388] 999 1388 3448 142 0 0 0 n2txd [1305873.758720] [ 1397] 0 1397 1073 0 0 0 0 collectdmon [1305873.758722] [ 1442] 0 1442 3188 2 0 0 0 getty [1305873.758724] [ 2486] 0 2486 146075 79 0 0 0 console-kit-dae [1305873.758726] [ 2553] 0 2553 46644 113 0 0 0 polkitd [1305873.758730] [23713] 107 23713 771970 131663 0 0 0 java [1305873.758732] [12560] 0 12560 4307 1 0 0 0 upstart-udev-br [1305873.758734] [12562] 0 12562 5364 1 0 -17 -1000 udevd [1305873.758736] [12690] 0 12690 18311 100 0 0 0 nginx [1305873.758739] [12691] 33 12691 18458 197 0 0 0 nginx [1305873.758741] [12692] 33 12692 18543 295 0 0 0 nginx [1305873.758743] [12693] 33 12693 18464 210 0 0 0 nginx [1305873.758745] [12694] 33 12694 18428 177 0 0 0 nginx [1305873.758747] [28049] 0 28049 6276 28 0 0 0 master [1305873.758749] [28051] 105 28051 6833 45 0 0 0 qmgr [1305873.758751] [28103] 0 28103 13037 109 0 0 0 /usr/sbin/munin [1305873.758753] [15496] 0 15496 20489 216 0 0 0 sshd [1305873.758755] [15706] 1012 15706 20489 212 0 0 0 sshd [1305873.758757] [15707] 1012 15707 5845 732 0 0 0 bash [1305873.758759] [17320] 0 17320 20488 213 0 0 0 sshd [1305873.758762] [17532] 1003 17532 20488 213 0 0 0 sshd [1305873.758764] [17533] 1003 17533 5845 734 0 0 0 bash [1305873.758766] [21278] 105 21278 6792 70 0 0 0 pickup [1305873.758768] [21519] 0 21519 11133 84 0 0 0 cron [1305873.758770] [21521] 1015 21521 1099 26 0 0 0 sh [1305873.758772] [21522] 1015 21522 3073 46 0 0 0 packages_availa [1305873.758774] [21529] 1015 21529 3073 49 0 0 0 packages_availa [1305873.758776] [21530] 1015 21530 21342 11490 0 0 0 apt-check [1305873.758779] Out of memory: Kill process 23713 (java) score 217 or sacrifice child [1305873.758798] Killed process 23713 (java) total-vm:3087880kB, anon-rss:526652kB, file-rss:0kB

orifito commented 9 years ago

The log indicates the OOM killer choose the java process as the one to kill in order to maintain OS operation when it is out of memory. The server just needs more RAM.