Open yige-hu opened 4 years ago
The issue is FxMark MWUL workload will first create many small files to fill in the whole pmem space, then remove(unlink) them. NOVA is a hybrid filesystem and has DRAM requirement for each file. Creating too many small files may result in OOM. That is an issue that we want to fix but I don't have time now... A workaround would be limiting the number of small files in MWUL workload.
Thanks, Andiry
On Sun, Oct 20, 2019 at 6:06 PM Yige Hu notifications@github.com wrote:
Reproducible both on a raw machine and in a QEMU VM. MWCM and MWCL work on the raw machine, but MWUL crashes with the same setup. Not sure if it's related to issue #1 https://github.com/NVSL/linux-nova/issues/1 since this only happens on the unlink workload.
Raw machine config: CPU: AMD Ryzen Threadripper 2990WX 32-Core Processor; DRAM: 8*16G DDR4; DRAM emulated persistent memory: 10G, mounted with NOVA file system; OS: Ubuntu 18.04.3 LTS, with linux-nova 5.1.0+ kernel.
QEMU VM config: KVM enabled; CPU: 4 virtual CPU cores; DRAM: 1G; DRAM emulated persistent memory: 512M, mounted with NOVA file system; OS: Ubuntu 18.04.3 LTS, with linux-nova 5.1.0+ kernel.
In FxMark main function:
run_config = [ (Runner.CORE_FINE_GRAIN, PerfMon.LEVEL_LOW, ("nvme", "", "MWUL", "", "directio")), ]
(I've ported NOVA mount for FxMark.)
I can't capture kernel error message on the raw machine console.
Kernel log from QEMU:
[ 1082.306675] fxmark invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0 [ 1082.326329] CPU: 0 PID: 12304 Comm: fxmark Tainted: G W 5.1.0+ #1 [ 1082.330390] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 [ 1082.336790] Call Trace: [ 1082.338281] dump_stack+0x85/0xcb [ 1082.347124] dump_header+0x57/0x550 [ 1082.348669] ? _raw_spin_unlock_irqrestore+0x32/0x60 [ 1082.351656] oom_kill_process+0xb5/0x290 [ 1082.355074] out_of_memory+0xf3/0x680 [ 1082.358184] alloc_pages_slowpath+0xc12/0xf70 [ 1082.362263] ? find_held_lock+0x34/0xa0 [ 1082.364283] alloc_pages_nodemask+0x31c/0x390 [ 1082.366430] pagecache_get_page+0xa5/0x2e0 [ 1082.368380] filemap_fault+0x32c/0x8d0 [ 1082.370150] ? ext4_filemap_fault+0x27/0x3e [ 1082.372003] ext4_filemap_fault+0x2f/0x3e [ 1082.373753] __do_fault+0x53/0x129 [ 1082.375279] handle_mm_fault+0xd0e/0x1110 [ 1082.377123] do_page_fault+0x34a/0x5b0 [ 1082.378852] ? async_page_fault+0x8/0x30 [ 1082.380760] async_page_fault+0x1e/0x30 [ 1082.382589] RIP: 0033:0x7f44805d3c8e [ 1082.384410] Code: Bad RIP value. [ 1082.385992] RSP: 002b:00007ffc5aa53a60 EFLAGS: 00010246 [ 1082.388492] RAX: 0000000000000003 RBX: 00007f4480ad2040 RCX: 00007f44805d3c8e [ 1082.391805] RDX: 0000000000000042 RSI: 00007ffc5aa53ae0 RDI: 00000000ffffff9c [ 1082.395144] RBP: 00007ffc5aa53ae0 R08: 0000000000000000 R09: 0000000000000000 [ 1082.398266] R10: 00000000000001c0 R11: 0000000000000246 R12: 000055ee52638c6f [ 1082.401381] R13: 0000000000000000 R14: 00007f4480ac9000 R15: 000055ee5283b680 [ 1082.412623] Mem-Info: [ 1082.415381] active_anon:1 inactive_anon:0 isolated_anon:0 [ 1082.415381] active_file:0 inactive_file:19 isolated_file:0 [ 1082.415381] unevictable:0 dirty:0 writeback:0 unstable:0 [ 1082.415381] slab_reclaimable:77402 slab_unreclaimable:13589 [ 1082.415381] mapped:32 shmem:0 pagetables:1169 bounce:0 [ 1082.415381] free:1960 free_pcp:364 free_cma:0 [ 1082.448728] Node 0 active_anon:4kB inactive_anon:80kB active_file:36kB inactive_file:68kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:308kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no [ 1082.485340] Node 0 DMA free:3192kB min:100kB low:124kB high:148kB active_anon:0kB inactive_anon:4kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB kernel_stack:0kB pagetables:48kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB [ 1082.497058] lowmem_reserve[]: 0 367 367 367 367 [ 1082.499104] Node 0 DMA32 free:15684kB min:7800kB low:8400kB high:9000kB active_anon:4kB inactive_anon:84kB active_file:268kB inactive_file:2820kB unevictable:0kB writepending:0kB present:507760kB managed:383712kB mlocked:0kB kernel_stack:2176kB pagetables:4628kB bounce:0kB free_pcp:2524kB local_pcp:648kB free_cma:0kB [ 1082.514127] lowmem_reserve[]: 0 0 0 0 0 [ 1082.515838] Node 0 DMA: 1484kB (EH) 308kB (EH) 2616kB (EH) 2632kB (EH) 1564kB (EH) 3128kB (EH) 1256kB (H) 0512kB 01024kB 02048kB 04096kB = 3680kB [ 1082.525186] Node 0 DMA32: 04kB 3138kB (UE) 22616kB (UE) 30132kB (UE) 064kB 0128kB 0256kB 0512kB 01024kB 02048kB 04096kB = 15752kB [ 1082.543391] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 1082.549303] 2435 total pagecache pages [ 1082.550993] 61 pages in swap cache [ 1082.552498] Swap cache stats: add 18883, delete 18822, find 813/1657 [ 1082.555320] Free swap = 1825760kB [ 1082.556820] Total swap = 1885180kB [ 1082.558336] 130938 pages RAM [ 1082.561099] 0 pages HighMem/MovableOnly [ 1082.564939] 31033 pages reserved [ 1082.567954] 0 pages cma reserved [ 1082.570760] 0 pages hwpoisoned [ 1082.573592] Unreclaimable slab info: ....
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/NVSL/linux-nova/issues/77?email_source=notifications&email_token=AAKBYEG52NAYDH2NX5UILETQPT6AJA5CNFSM4JCW525KYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HTBEBPQ, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKBYEAOSYDKB3MH42O7RZ3QPT6AJANCNFSM4JCW525A .
Reproducible both on a raw machine and in a QEMU VM. MWCM and MWCL work on the raw machine, but MWUL crashes with the same setup. Not sure if it's related to issue #1 , since this is reproducible on a machine with large DRAM, and only happens on the unlink workload.
Raw machine config: CPU: AMD Ryzen Threadripper 2990WX 32-Core Processor; DRAM: 8*16G DDR4; DRAM emulated persistent memory: 10G, mounted with NOVA file system; OS: Ubuntu 18.04.3 LTS, with linux-nova 5.1.0+ kernel.
QEMU VM config: KVM enabled; CPU: 4 virtual CPU cores; DRAM: 1G; DRAM emulated persistent memory: 512M, mounted with NOVA file system; OS: Ubuntu 18.04.3 LTS, with linux-nova 5.1.0+ kernel.
In FxMark main function:
(I've ported NOVA mount for FxMark.)
I can't capture kernel error message on the raw machine console.
Kernel log from QEMU: