NVSL / linux-nova

NOVA is a log-structured file system designed for byte-addressable non-volatile memories, developed at the University of California, San Diego.
http://nvsl.ucsd.edu/index.php?path=projects/nova
Other
421 stars 118 forks source link

OOM with FxMark MWUL #77

Open yige-hu opened 4 years ago

yige-hu commented 4 years ago

Reproducible both on a raw machine and in a QEMU VM. MWCM and MWCL work on the raw machine, but MWUL crashes with the same setup. Not sure if it's related to issue #1 , since this is reproducible on a machine with large DRAM, and only happens on the unlink workload.

Raw machine config: CPU: AMD Ryzen Threadripper 2990WX 32-Core Processor; DRAM: 8*16G DDR4; DRAM emulated persistent memory: 10G, mounted with NOVA file system; OS: Ubuntu 18.04.3 LTS, with linux-nova 5.1.0+ kernel.

QEMU VM config: KVM enabled; CPU: 4 virtual CPU cores; DRAM: 1G; DRAM emulated persistent memory: 512M, mounted with NOVA file system; OS: Ubuntu 18.04.3 LTS, with linux-nova 5.1.0+ kernel.

In FxMark main function:

run_config = [
    (Runner.CORE_FINE_GRAIN,
     PerfMon.LEVEL_LOW,
      ("nvme", "*", "MWUL", "*", "directio")),
]

(I've ported NOVA mount for FxMark.)

I can't capture kernel error message on the raw machine console.

Kernel log from QEMU:

[ 1082.306675] fxmark invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
[ 1082.326329] CPU: 0 PID: 12304 Comm: fxmark Tainted: G        W         5.1.0+ #1
[ 1082.330390] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 1082.336790] Call Trace:
[ 1082.338281]  dump_stack+0x85/0xcb
[ 1082.347124]  dump_header+0x57/0x550
[ 1082.348669]  ? _raw_spin_unlock_irqrestore+0x32/0x60
[ 1082.351656]  oom_kill_process+0xb5/0x290
[ 1082.355074]  out_of_memory+0xf3/0x680
[ 1082.358184]  __alloc_pages_slowpath+0xc12/0xf70
[ 1082.362263]  ? find_held_lock+0x34/0xa0
[ 1082.364283]  __alloc_pages_nodemask+0x31c/0x390
[ 1082.366430]  pagecache_get_page+0xa5/0x2e0
[ 1082.368380]  filemap_fault+0x32c/0x8d0
[ 1082.370150]  ? ext4_filemap_fault+0x27/0x3e
[ 1082.372003]  ext4_filemap_fault+0x2f/0x3e
[ 1082.373753]  __do_fault+0x53/0x129
[ 1082.375279]  __handle_mm_fault+0xd0e/0x1110
[ 1082.377123]  __do_page_fault+0x34a/0x5b0
[ 1082.378852]  ? async_page_fault+0x8/0x30
[ 1082.380760]  async_page_fault+0x1e/0x30
[ 1082.382589] RIP: 0033:0x7f44805d3c8e
[ 1082.384410] Code: Bad RIP value.
[ 1082.385992] RSP: 002b:00007ffc5aa53a60 EFLAGS: 00010246
[ 1082.388492] RAX: 0000000000000003 RBX: 00007f4480ad2040 RCX: 00007f44805d3c8e
[ 1082.391805] RDX: 0000000000000042 RSI: 00007ffc5aa53ae0 RDI: 00000000ffffff9c
[ 1082.395144] RBP: 00007ffc5aa53ae0 R08: 0000000000000000 R09: 0000000000000000
[ 1082.398266] R10: 00000000000001c0 R11: 0000000000000246 R12: 000055ee52638c6f
[ 1082.401381] R13: 0000000000000000 R14: 00007f4480ac9000 R15: 000055ee5283b680
[ 1082.412623] Mem-Info:
[ 1082.415381] active_anon:1 inactive_anon:0 isolated_anon:0
[ 1082.415381]  active_file:0 inactive_file:19 isolated_file:0
[ 1082.415381]  unevictable:0 dirty:0 writeback:0 unstable:0
[ 1082.415381]  slab_reclaimable:77402 slab_unreclaimable:13589
[ 1082.415381]  mapped:32 shmem:0 pagetables:1169 bounce:0
[ 1082.415381]  free:1960 free_pcp:364 free_cma:0
[ 1082.448728] Node 0 active_anon:4kB inactive_anon:80kB active_file:36kB inactive_file:68kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:308kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[ 1082.485340] Node 0 DMA free:3192kB min:100kB low:124kB high:148kB active_anon:0kB inactive_anon:4kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB kernel_stack:0kB pagetables:48kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[ 1082.497058] lowmem_reserve[]: 0 367 367 367 367
[ 1082.499104] Node 0 DMA32 free:15684kB min:7800kB low:8400kB high:9000kB active_anon:4kB inactive_anon:84kB active_file:268kB inactive_file:2820kB unevictable:0kB writepending:0kB present:507760kB managed:383712kB mlocked:0kB kernel_stack:2176kB pagetables:4628kB bounce:0kB free_pcp:2524kB local_pcp:648kB free_cma:0kB
[ 1082.514127] lowmem_reserve[]: 0 0 0 0 0
[ 1082.515838] Node 0 DMA: 148*4kB (EH) 30*8kB (EH) 26*16kB (EH) 26*32kB (EH) 15*64kB (EH) 3*128kB (EH) 1*256kB (H) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3680kB
[ 1082.525186] Node 0 DMA32: 0*4kB 313*8kB (UE) 226*16kB (UE) 301*32kB (UE) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15752kB
[ 1082.543391] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[ 1082.549303] 2435 total pagecache pages
[ 1082.550993] 61 pages in swap cache
[ 1082.552498] Swap cache stats: add 18883, delete 18822, find 813/1657
[ 1082.555320] Free swap  = 1825760kB
[ 1082.556820] Total swap = 1885180kB
[ 1082.558336] 130938 pages RAM
[ 1082.561099] 0 pages HighMem/MovableOnly
[ 1082.564939] 31033 pages reserved
[ 1082.567954] 0 pages cma reserved
[ 1082.570760] 0 pages hwpoisoned
[ 1082.573592] Unreclaimable slab info:
[ 1082.577436] Name                      Used          Total
[ 1082.582173] fib6_nodes                 5KB          8KB
[ 1082.586625] ip6_dst_cache              6KB         15KB
[ 1082.591069] RAWv6                     18KB         32KB
[ 1082.595439] UDPv6                      0KB         31KB
[ 1082.599784] TCPv6                      3KB         30KB
[ 1082.604048] scsi_sense_cache           2KB          8KB
[ 1082.606663] sd_ext_cdb                 0KB          7KB
[ 1082.608950] sgpool-128                 8KB         31KB
[ 1082.611266] sgpool-64                  4KB         31KB
[ 1082.613564] sgpool-32                  2KB         31KB
[ 1082.615895] sgpool-16                  1KB         15KB
[ 1082.618206] sgpool-8                   1KB         15KB
[ 1082.620497] mqueue_inode_cache          1KB         30KB
[ 1082.622855] fuse_request               0KB         15KB
[ 1082.625161] jbd2_inode                 3KB         31KB
[ 1082.627494] ext4_system_zone           5KB          7KB
[ 1082.629770] bio-1                      2KB         15KB
[ 1082.632107] posix_timers_cache          0KB         15KB
[ 1082.634815] UNIX                     247KB        270KB
[ 1082.637114] tcp_bind_bucket            1KB          8KB
[ 1082.639447] ip_fib_trie                3KB          7KB
[ 1082.641953] ip_fib_alias               3KB          7KB
[ 1082.644282] ip_dst_cache               8KB         15KB
[ 1082.646603] RAW                       14KB         30KB
[ 1082.648916] UDP                        6KB         61KB
[ 1082.651257] tw_sock_TCP                0KB         15KB
[ 1082.653524] request_sock_TCP           0KB         15KB
[ 1082.655824] TCP                       13KB         29KB
[ 1082.658098] hugetlbfs_inode_cache          2KB         30KB
[ 1082.660550] eventpoll_pwq             58KB         63KB
[ 1082.662848] eventpoll_epi             82KB         94KB
[ 1082.665150] inotify_inode_mark         56KB         63KB
[ 1082.667527] request_queue            100KB        114KB
[ 1082.669820] blkdev_ioc                35KB         47KB
[ 1082.672141] bio-0                     36KB         78KB
[ 1082.674441] biovec-max               284KB        403KB
[ 1082.676749] biovec-128                 0KB         31KB
[ 1082.679045] biovec-64                  0KB         94KB
[ 1082.681300] biovec-16                  0KB         47KB
[ 1082.683628] bio_integrity_payload          1KB         15KB
[ 1082.686068] uid_cache                  4KB         15KB
[ 1082.688347] dmaengine-unmap-256          2KB         31KB
[ 1082.690706] dmaengine-unmap-128          1KB         31KB
[ 1082.693047] dmaengine-unmap-16          0KB         15KB
[ 1082.695395] dmaengine-unmap-2          0KB          7KB
[ 1082.697667] audit_buffer               0KB          7KB
[ 1082.699992] skbuff_fclone_cache          0KB         15KB
[ 1082.702371] skbuff_head_cache          0KB         62KB
[ 1082.704670] file_lock_cache            3KB         46KB
[ 1082.706956] file_lock_ctx             16KB         30KB
[ 1082.709231] fsnotify_mark_connector         47KB         54KB
[ 1082.711788] shmem_inode_cache       1755KB       1759KB
[ 1082.714093] task_delay_info           65KB         76KB
[ 1082.716382] taskstats                  3KB         35KB
[ 1082.718671] proc_dir_entry           204KB        218KB
[ 1082.720940] pde_opener                 1KB         15KB
[ 1082.723263] seq_file                   2KB         46KB
[ 1082.725526] sigqueue                   0KB          7KB
[ 1082.727851] kernfs_iattrs_cache         46KB         47KB
[ 1082.730219] kernfs_node_cache      11447KB      11472KB
[ 1082.732508] mnt_cache                252KB        267KB
[ 1082.734945] filp                    8265KB       8277KB
[ 1082.737236] names_cache                8KB        128KB
[ 1082.739565] lsm_file_cache           421KB        472KB
[ 1082.741846] key_jar                   42KB         63KB
[ 1082.744161] nsproxy                    2KB          7KB
[ 1082.746459] vm_area_struct          2067KB       2135KB
[ 1082.749053] mm_struct                 76KB        123KB
[ 1082.751434] fs_cache                  22KB         31KB
[ 1082.753743] files_cache               50KB         93KB
[ 1082.756066] signal_cache             195KB        215KB
[ 1082.758359] sighand_cache            276KB        307KB
[ 1082.760636] task_struct              926KB       1008KB
[ 1082.762934] cred_jar                 104KB        189KB
[ 1082.765231] anon_vma_chain           967KB       1015KB
[ 1082.767566] anon_vma                 758KB        816KB
[ 1082.769850] pid                       66KB         80KB
[ 1082.772172] Acpi-Operand             190KB        199KB
[ 1082.774495] Acpi-ParseExt              0KB          7KB
[ 1082.776794] Acpi-Parse                 0KB          7KB
[ 1082.779112] Acpi-State                 0KB         30KB
[ 1082.781380] Acpi-Namespace           145KB        154KB
[ 1082.783688] numa_policy                1KB          7KB
[ 1082.786246] trace_event_file         571KB        574KB
[ 1082.788527] ftrace_event_field       1278KB       1283KB
[ 1082.790846] pool_workqueue            22KB         31KB
[ 1082.793161] task_group               104KB        123KB
[ 1082.796761] debug_objects_cache        423KB       5507KB
[ 1082.806313] page->ptl                512KB        581KB
[ 1082.812693] dma-kmalloc-512            0KB         15KB
[ 1082.818809] kmalloc-8k               741KB        849KB
[ 1082.823833] kmalloc-4k               718KB        787KB
[ 1082.826081] kmalloc-2k              2223KB       2269KB
[ 1082.828495] kmalloc-1k               916KB        956KB
[ 1082.830782] kmalloc-512             1500KB       1557KB
[ 1082.833084] kmalloc-256              272KB        280KB
[ 1082.835337] kmalloc-192              299KB        303KB
[ 1082.837543] kmalloc-128              273KB        292KB
[ 1082.839819] kmalloc-96               419KB        432KB
[ 1082.842046] kmalloc-64              1439KB       1468KB
[ 1082.844292] kmalloc-32              1138KB       1154KB
[ 1082.846498] kmalloc-16               917KB        925KB
[ 1082.848726] kmalloc-8                859KB        911KB
[ 1082.850946] kmem_cache_node          120KB        126KB
[ 1082.853143] kmem_cache               147KB        158KB
[ 1082.855376] Tasks state (memory values in pages):
[ 1082.857339] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[ 1082.861076] [    222]     0   222    23711     1637   192512      152             0 systemd-journal
[ 1082.865122] [    229]     0   229    26475        0    98304       69             0 lvmetad
[ 1082.868777] [    238]     0   238    10750        1   110592      360         -1000 systemd-udevd
[ 1082.872627] [    374] 62583   374    35481        0   184320      155             0 systemd-timesyn
[ 1082.876730] [    454]   100   454    20010        0   184320      182             0 systemd-network
[ 1082.881330] [    457]   101   457    17656        0   184320      170             0 systemd-resolve
[ 1082.885300] [    477]     0   477     7082        0   102400       53             0 atd
[ 1082.889424] [    479]     0   479    42273        1   225280     1941             0 networkd-dispat
[ 1082.896447] [    480]   103   480    12510        0   147456      170          -900 dbus-daemon
[ 1082.907430] [    482]     0   482    23884        0    86016       79             0 lxcfs
[ 1082.914342] [    485]     0   485    17643        0   176128      186             0 systemd-logind
[ 1082.923099] [    505]     0   505     7506        0    98304       75             0 cron
[ 1082.930180] [    506]   102   506    65758      100   163840      239             0 rsyslogd
[ 1082.939265] [    507]     0   507    71623       24   188416      255             0 accounts-daemon
[ 1082.948271] [    511]     0   511    27619        0   114688       84             0 irqbalance
[ 1082.958192] [    514]     0   514   289573        0   299008     3065          -900 snapd
[ 1082.966180] [    531]     0   531    18073        0   184320      189         -1000 sshd
[ 1082.974004] [    547]     0   547    46485        0   262144     1999             0 unattended-upgr
[ 1082.982779] [    560]     0   560    72219        0   204800      218             0 polkitd
[ 1082.990520] [    565]     0   565     3665        0    73728       37             0 agetty
[ 1082.999367] [    577]     0   577     3721        0    69632       37             0 agetty
[ 1083.008086] [    916]     0   916     3665        0    73728       36             0 getty
[ 1083.016685] [   1148]     0  1148    27534        1   262144      265             0 sshd
[ 1083.023280] [   1157]  1000  1157    19160        0   196608      279             0 systemd
[ 1083.031170] [   1158]  1000  1158    27995        0   249856      665             0 (sd-pam)
[ 1083.039290] [   1303]  1000  1303    27534        0   253952      265             0 sshd
[ 1083.046043] [   1304]  1000  1304     5395        1    86016      452             0 bash
[ 1083.052429] [  11976]     0 11976    27534        1   258048      264             0 sshd
[ 1083.059018] [  12061]  1000 12061    27534        0   249856      264             0 sshd
[ 1083.065966] [  12062]  1000 12062     5364        1    86016      438             0 bash
[ 1083.074148] [  12207]  1000 12207    12333        1   135168     1860             0 python3
[ 1083.083793] [  12303]  1000 12303     1156        0    53248       23             0 sh
[ 1083.092075] [  12304]  1000 12304     1114        0    53248       20             0 fxmark
[ 1083.100999] [  12305]  1000 12305     1114       12    53248       11             0 fxmark
[ 1083.110507] [  12306]  1000 12306     1114        0    53248       20             0 fxmark
[ 1083.120238] [  12308]  1000 12308     1114        0    53248       30             0 fxmark
....
Andiry commented 4 years ago

The issue is FxMark MWUL workload will first create many small files to fill in the whole pmem space, then remove(unlink) them. NOVA is a hybrid filesystem and has DRAM requirement for each file. Creating too many small files may result in OOM. That is an issue that we want to fix but I don't have time now... A workaround would be limiting the number of small files in MWUL workload.

Thanks, Andiry

On Sun, Oct 20, 2019 at 6:06 PM Yige Hu notifications@github.com wrote:

Reproducible both on a raw machine and in a QEMU VM. MWCM and MWCL work on the raw machine, but MWUL crashes with the same setup. Not sure if it's related to issue #1 https://github.com/NVSL/linux-nova/issues/1 since this only happens on the unlink workload.

Raw machine config: CPU: AMD Ryzen Threadripper 2990WX 32-Core Processor; DRAM: 8*16G DDR4; DRAM emulated persistent memory: 10G, mounted with NOVA file system; OS: Ubuntu 18.04.3 LTS, with linux-nova 5.1.0+ kernel.

QEMU VM config: KVM enabled; CPU: 4 virtual CPU cores; DRAM: 1G; DRAM emulated persistent memory: 512M, mounted with NOVA file system; OS: Ubuntu 18.04.3 LTS, with linux-nova 5.1.0+ kernel.

In FxMark main function:

run_config = [ (Runner.CORE_FINE_GRAIN, PerfMon.LEVEL_LOW, ("nvme", "", "MWUL", "", "directio")), ]

(I've ported NOVA mount for FxMark.)

I can't capture kernel error message on the raw machine console.

Kernel log from QEMU:

[ 1082.306675] fxmark invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0 [ 1082.326329] CPU: 0 PID: 12304 Comm: fxmark Tainted: G W 5.1.0+ #1 [ 1082.330390] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 [ 1082.336790] Call Trace: [ 1082.338281] dump_stack+0x85/0xcb [ 1082.347124] dump_header+0x57/0x550 [ 1082.348669] ? _raw_spin_unlock_irqrestore+0x32/0x60 [ 1082.351656] oom_kill_process+0xb5/0x290 [ 1082.355074] out_of_memory+0xf3/0x680 [ 1082.358184] alloc_pages_slowpath+0xc12/0xf70 [ 1082.362263] ? find_held_lock+0x34/0xa0 [ 1082.364283] alloc_pages_nodemask+0x31c/0x390 [ 1082.366430] pagecache_get_page+0xa5/0x2e0 [ 1082.368380] filemap_fault+0x32c/0x8d0 [ 1082.370150] ? ext4_filemap_fault+0x27/0x3e [ 1082.372003] ext4_filemap_fault+0x2f/0x3e [ 1082.373753] __do_fault+0x53/0x129 [ 1082.375279] handle_mm_fault+0xd0e/0x1110 [ 1082.377123] do_page_fault+0x34a/0x5b0 [ 1082.378852] ? async_page_fault+0x8/0x30 [ 1082.380760] async_page_fault+0x1e/0x30 [ 1082.382589] RIP: 0033:0x7f44805d3c8e [ 1082.384410] Code: Bad RIP value. [ 1082.385992] RSP: 002b:00007ffc5aa53a60 EFLAGS: 00010246 [ 1082.388492] RAX: 0000000000000003 RBX: 00007f4480ad2040 RCX: 00007f44805d3c8e [ 1082.391805] RDX: 0000000000000042 RSI: 00007ffc5aa53ae0 RDI: 00000000ffffff9c [ 1082.395144] RBP: 00007ffc5aa53ae0 R08: 0000000000000000 R09: 0000000000000000 [ 1082.398266] R10: 00000000000001c0 R11: 0000000000000246 R12: 000055ee52638c6f [ 1082.401381] R13: 0000000000000000 R14: 00007f4480ac9000 R15: 000055ee5283b680 [ 1082.412623] Mem-Info: [ 1082.415381] active_anon:1 inactive_anon:0 isolated_anon:0 [ 1082.415381] active_file:0 inactive_file:19 isolated_file:0 [ 1082.415381] unevictable:0 dirty:0 writeback:0 unstable:0 [ 1082.415381] slab_reclaimable:77402 slab_unreclaimable:13589 [ 1082.415381] mapped:32 shmem:0 pagetables:1169 bounce:0 [ 1082.415381] free:1960 free_pcp:364 free_cma:0 [ 1082.448728] Node 0 active_anon:4kB inactive_anon:80kB active_file:36kB inactive_file:68kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:308kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no [ 1082.485340] Node 0 DMA free:3192kB min:100kB low:124kB high:148kB active_anon:0kB inactive_anon:4kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB kernel_stack:0kB pagetables:48kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB [ 1082.497058] lowmem_reserve[]: 0 367 367 367 367 [ 1082.499104] Node 0 DMA32 free:15684kB min:7800kB low:8400kB high:9000kB active_anon:4kB inactive_anon:84kB active_file:268kB inactive_file:2820kB unevictable:0kB writepending:0kB present:507760kB managed:383712kB mlocked:0kB kernel_stack:2176kB pagetables:4628kB bounce:0kB free_pcp:2524kB local_pcp:648kB free_cma:0kB [ 1082.514127] lowmem_reserve[]: 0 0 0 0 0 [ 1082.515838] Node 0 DMA: 1484kB (EH) 308kB (EH) 2616kB (EH) 2632kB (EH) 1564kB (EH) 3128kB (EH) 1256kB (H) 0512kB 01024kB 02048kB 04096kB = 3680kB [ 1082.525186] Node 0 DMA32: 04kB 3138kB (UE) 22616kB (UE) 30132kB (UE) 064kB 0128kB 0256kB 0512kB 01024kB 02048kB 04096kB = 15752kB [ 1082.543391] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 1082.549303] 2435 total pagecache pages [ 1082.550993] 61 pages in swap cache [ 1082.552498] Swap cache stats: add 18883, delete 18822, find 813/1657 [ 1082.555320] Free swap = 1825760kB [ 1082.556820] Total swap = 1885180kB [ 1082.558336] 130938 pages RAM [ 1082.561099] 0 pages HighMem/MovableOnly [ 1082.564939] 31033 pages reserved [ 1082.567954] 0 pages cma reserved [ 1082.570760] 0 pages hwpoisoned [ 1082.573592] Unreclaimable slab info: ....

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/NVSL/linux-nova/issues/77?email_source=notifications&email_token=AAKBYEG52NAYDH2NX5UILETQPT6AJA5CNFSM4JCW525KYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HTBEBPQ, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKBYEAOSYDKB3MH42O7RZ3QPT6AJANCNFSM4JCW525A .