NVSL / linux-nova

NOVA is a log-structured file system designed for byte-addressable non-volatile memories, developed at the University of California, San Diego.
http://nvsl.ucsd.edu/index.php?path=projects/nova
Other
425 stars 118 forks source link

Errors with metadata_csum + data_csum + data parity #126

Open hayley-leblanc opened 3 years ago

hayley-leblanc commented 3 years ago

Hi Andiry,

I'm messing around with NOVA's metdata_csum, data_csum, and data parity and I'm running into two issues (in regular execution, not crash consistency). I'm mounting a fresh instance of NOVA at /mnt/pmem and running a program test2.zip that performs the following operations:

  1. Creates a 4096 buffer and memsets it to 'a'
  2. Creates a new file /mnt/pmem/foo
  3. Writes 4096 bytes from the buffer to foo using pwrite()
  4. Reads the 4096 bytes back into a new buffer using pread().

This program runs into two issues.

  1. There is a KASAN bug that occurs during the pwrite() with the following output:

    [   40.681069] ==================================================================
    [   40.682077] BUG: KASAN: slab-out-of-bounds in __crc32c_le_base+0x77/0x2a0
    [   40.682980] Read of size 4 at addr ffff8881e0afa100 by task a.out/335
    [   40.683833] 
    [   40.684049] CPU: 0 PID: 335 Comm: a.out Tainted: G            E     5.1.0+ #437
    [   40.685017] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
    [   40.686268] Call Trace:
    [   40.686651]  dump_stack+0x94/0xd8
    [   40.687159]  ? __crc32c_le_base+0x77/0x2a0
    [   40.687782]  print_address_description+0x78/0x290
    [   40.688489]  ? __crc32c_le_base+0x77/0x2a0
    [   40.689106]  ? __crc32c_le_base+0x77/0x2a0
    [   40.689723]  kasan_report+0x149/0x18c
    [   40.690246]  ? __crc32c_le_base+0x77/0x2a0
    [   40.690747]  __asan_load4+0x78/0x80
    [   40.691158]  __crc32c_le_base+0x77/0x2a0
    [   40.691619]  chksum_update+0x38/0x60
    [   40.692041]  crypto_shash_update+0x66/0x90
    [   40.692523]  crc32c+0xa6/0x120
    [   40.692890]  ? crc32c.part.0+0x10/0x10
    [   40.693333]  ? is_bpf_text_address+0x1a/0x30
    [   40.693835]  ? kernel_text_address+0xf4/0x100
    [   40.694351]  ? kasan_check_write+0x14/0x20
    [   40.694834]  ? _raw_spin_lock_irqsave+0x8e/0xf0
    [   40.695395]  ? _raw_write_lock_bh+0xd0/0xd0
    [   40.695887]  ? depot_save_stack+0x2e5/0x480
    [   40.696379]  ? save_stack+0xa3/0xd0
    [   40.696792]  ? save_stack+0x43/0xd0
    [   40.697207]  ? __kasan_slab_free+0x135/0x190
    [   40.697711]  ? kasan_slab_free+0xe/0x10
    [   40.698184]  ? kfree+0x98/0x1d0
    [   40.698513]  ? nova_update_block_csum_parity+0x570/0xf1f [nova]
    [   40.699100]  ? nova_protect_file_data+0xdad/0xe17 [nova]
    [   40.699627]  ? do_nova_inplace_file_write+0xf9e/0x17a4 [nova]
    [   40.700196]  ? nova_inplace_file_write+0xb2/0xdb [nova]
    [   40.700716]  nova_update_stripe_csum+0x57f/0x1213 [nova]
    [   40.701236]  ? kasan_check_write+0x14/0x20
    [   40.701648]  ? nova_repair_entry_pr+0x33e/0x33e [nova]
    [   40.702159]  ? depot_save_stack+0x2e5/0x480
    [   40.702572]  ? save_stack+0xa3/0xd0
    [   40.702926]  nova_update_block_csum+0xf9/0x1b9 [nova]
    [   40.703434]  ? nova_check_inode_integrity+0x5e9/0x5e9 [nova]
    [   40.703987]  ? __kasan_slab_free+0x14c/0x190
    [   40.704417]  ? nova_update_block_csum_parity+0x570/0xf1f [nova]
    [   40.705003]  nova_update_block_csum_parity+0xeec/0xf1f [nova]
    [   40.705574]  ? nova_update_pgoff_parity+0x2be/0x2be [nova]
    [   40.706129]  ? kasan_unpoison_shadow+0x35/0x50
    [   40.706573]  ? __kasan_kmalloc.constprop.8+0xa7/0xd0
    [   40.707068]  ? check_stack_object+0x56/0x100
    [   40.707487]  ? __virt_addr_valid+0xeb/0x130
    [   40.707898]  ? kasan_check_write+0x14/0x20
    [   40.708308]  nova_protect_file_data+0xdad/0xe17 [nova]
    [   40.708818]  ? nova_init_file_write_entry+0x12c/0x12c [nova]
    [   40.709375]  ? nova_new_data_blocks+0x10f/0x2a1 [nova]
    [   40.709883]  ? nova_new_data_blocks+0x159/0x2a1 [nova]
    [   40.710396]  ? nova_new_blocks+0xe44/0xe44 [nova]
    [   40.710861]  ? timespec64_trunc+0xb0/0xb0
    [   40.711263]  ? PERSISTENT_BARRIER+0xe/0x13 [nova]
    [   40.711731]  ? nova_flush_buffer+0x96/0xab [nova]
    [   40.712199]  do_nova_inplace_file_write+0xf9e/0x17a4 [nova]
    [   40.712752]  ? nova_dax_get_blocks+0xeab/0xeab [nova]
    [   40.713247]  ? is_bpf_text_address+0x1a/0x30
    [   40.713667]  ? kernel_text_address+0xf4/0x100
    [   40.714098]  ? __kernel_text_address+0x12/0x40
    [   40.714536]  ? unwind_get_return_address+0x36/0x50
    [   40.715007]  ? __save_stack_trace+0xa1/0x100
    [   40.715426]  ? aa_file_perm+0x1f1/0x720
    [   40.715805]  ? save_stack+0x43/0xd0
    [   40.716151]  ? __kasan_slab_free+0x135/0x190
    [   40.716571]  ? kasan_slab_free+0xe/0x10
    [   40.716949]  ? kmem_cache_free+0x7e/0x210
    [   40.717344]  ? aa_path_link+0x210/0x210
    [   40.717724]  ? mem_cgroup_charge_statistics+0x1f2/0x3a0
    [   40.718273]  ? kasan_check_write+0x14/0x20
    [   40.718681]  ? restore_nameidata+0xa9/0xc0
    [   40.719089]  ? apparmor_task_setrlimit+0x340/0x340
    [   40.719571]  nova_inplace_file_write+0xb2/0xdb [nova]
    [   40.720078]  nova_dax_file_write+0x8e/0xbf [nova]
    [   40.720551]  ? nova_cow_file_write+0x228/0x228 [nova]
    [   40.721051]  __vfs_write+0x60/0xc0
    [   40.721394]  vfs_write+0x119/0x2b0
    [   40.721735]  ksys_pwrite64+0xad/0xd0
    [   40.722106]  __x64_sys_pwrite64+0x5a/0x70
    [   40.722505]  do_syscall_64+0x7d/0x1b0
    [   40.722871]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [   40.723513] RIP: 0033:0x7f6c46be8f43
    [   40.723958] Code: f0 ff ff 73 01 c3 48 8b 0d 52 ef 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 83 3d c9 47 2c 00 00 75 13 49 89 ca b8 12 00 00 00 0f 05 <48> 3d 014
    [   40.725867] RSP: 002b:00007ffe940836a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000012
    [   40.726615] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f6c46be8f43
    [   40.727302] RDX: 0000000000001000 RSI: 00007ffe940846b0 RDI: 0000000000000003
    [   40.727988] RBP: 00007ffe940856c0 R08: 00007f6c47748960 R09: 00007f6c4773d740
    [   40.728675] R10: 0000000000000000 R11: 0000000000000246 R12: 000055c3cbd38800
    [   40.729362] R13: 00007ffe940857a0 R14: 0000000000000000 R15: 0000000000000000
    [   40.730136] 
    [   40.730295] Allocated by task 335:
    [   40.730633]  save_stack+0x43/0xd0
    [   40.730961]  __kasan_kmalloc.constprop.8+0xa7/0xd0
    [   40.731429]  kasan_kmalloc+0x9/0x10
    [   40.731774]  __kmalloc+0x10f/0x230
    [   40.732121]  nova_protect_file_data+0x197/0xe17 [nova]
    [   40.732629]  do_nova_inplace_file_write+0xf9e/0x17a4 [nova]
    [   40.733181]  nova_inplace_file_write+0xb2/0xdb [nova]
    [   40.733681]  nova_dax_file_write+0x8e/0xbf [nova]
    [   40.734145]  __vfs_write+0x60/0xc0
    [   40.734481]  vfs_write+0x119/0x2b0
    [   40.734817]  ksys_pwrite64+0xad/0xd0
    [   40.735170]  __x64_sys_pwrite64+0x5a/0x70
    [   40.735564]  do_syscall_64+0x7d/0x1b0
    [   40.735935]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [   40.736425] 
    [   40.736579] Freed by task 224:
    [   40.736882]  save_stack+0x43/0xd0
    [   40.737210]  __kasan_slab_free+0x135/0x190
    [   40.737612]  kasan_slab_free+0xe/0x10
    [   40.738006]  kfree+0x98/0x1d0
    [   40.738304]  kfree_link+0x1a/0x20
    [   40.738634]  vfs_readlink+0x148/0x240
    [   40.738997]  do_readlinkat+0x21c/0x250
    [   40.739369]  __x64_sys_readlinkat+0x5a/0x70
    [   40.739782]  do_syscall_64+0x7d/0x1b0
    [   40.740187]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [   40.740676] 
    [   40.740832] The buggy address belongs to the object at ffff8881e0af9100
    [   40.740832]  which belongs to the cache kmalloc-4k of size 4096
    [   40.742088] The buggy address is located 0 bytes to the right of
    [   40.742088]  4096-byte region [ffff8881e0af9100, ffff8881e0afa100)
    [   40.743621] The buggy address belongs to the page:
    [   40.744234] page:ffffea000782be00 count:1 mapcount:0 mapping:ffff8881ed402700 index:0x0 compound_mapcount: 0
    [   40.745470] flags: 0x17ffffc0010200(slab|head)
    [   40.745958] raw: 0017ffffc0010200 dead000000000100 dead000000000200 ffff8881ed402700
    [   40.746704] raw: 0000000000000000 0000000080070007 00000001ffffffff 0000000000000000
    [   40.747444] page dumped because: kasan: bad access detected
    [   40.747982] 
    [   40.748136] Memory state around the buggy address:
    [   40.748602]  ffff8881e0afa000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    [   40.749297]  ffff8881e0afa080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    [   40.749996] >ffff8881e0afa100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
    [   40.750690]                    ^
    [   40.751008]  ffff8881e0afa180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
    [   40.751702]  ffff8881e0afa200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    [   40.752406] ==================================================================
  2. The pread() fails and prints out a big error:

    [  107.936020] nova: nova_verify_data_csum: nova data corruption detected! inode 33, strp 7 of 8, block offset 18329600, stripe nr 35807, csum calc 0xe2a1ca27, csum nvmm 0xb9bb73b0, csum nvmm replica 0xb9bb73b0
    [  107.939409] nova: nova_verify_data_csum: nova data recovery begins
    [  107.940842] nova: nova_verify_data_csum: nova data recovery fails!
    [  107.942130] CPU: 0 PID: 337 Comm: a.out Tainted: G    B       E     5.1.0+ #437
    [  107.943015] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
    [  107.943964] Call Trace:
    [  107.944226]  dump_stack+0x94/0xd8
    [  107.944570]  nova_verify_data_csum+0x88a/0xbf4 [nova]
    [  107.945078]  ? nova_update_pgoff_csum+0x2f0/0x2f0 [nova]
    [  107.945607]  ? common_interrupt+0xa/0xf
    [  107.945994]  ? __uart_start.isra.11+0xce/0xe0
    [  107.946434]  ? _raw_spin_unlock_irqrestore+0x15/0x20
    [  107.946927]  ? __radix_tree_lookup+0x13a/0x160
    [  107.947423]  do_dax_mapping_read+0x545/0xb48 [nova]
    [  107.948008]  ? nova_dax_read_iter+0x230/0x230 [nova]
    [  107.948503]  ? __wake_up_common+0x1e0/0x1e0
    [  107.948920]  ? __fsnotify_parent+0xa7/0x270
    [  107.949340]  nova_dax_file_read+0xfd/0x1c6 [nova]
    [  107.949812]  ? do_dax_mapping_read+0xb48/0xb48 [nova]
    [  107.950322]  ? security_file_permission+0x128/0x1d0
    [  107.950844]  ? do_dax_mapping_read+0xb48/0xb48 [nova]
    [  107.951352]  __vfs_read+0x60/0xc0
    [  107.951681]  vfs_read+0xd5/0x200
    [  107.952002]  ksys_pread64+0xad/0xd0
    [  107.952377]  __x64_sys_pread64+0x5a/0x70
    [  107.952773]  do_syscall_64+0x7d/0x1b0
    [  107.953139]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [  107.953705] RIP: 0033:0x7f34b5f68ee3
    [  107.954091] Code: 08 00 ba 93 01 00 00 e8 0b 20 f5 ff 66 2e 0f 1f 84 00 00 00 00 00 90 83 3d 29 48 2c 00 00 75 13 49 89 ca b8 11 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 34 c3 48 83 ec 08 e8 0b b6 01 00 48 89 04 24
    [  107.955901] RSP: 002b:00007ffeeef5e528 EFLAGS: 00000246 ORIG_RAX: 0000000000000011
    [  107.956633] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f34b5f68ee3
    [  107.957509] RDX: 0000000000001000 RSI: 00007ffeeef5f530 RDI: 0000000000000003
    [  107.958403] RBP: 00007ffeeef60540 R08: 00007f34b622a760 R09: 00007f34b6ce3b80
    [  107.959159] R10: 0000000000000000 R11: 0000000000000246 R12: 00005609953fe980
    [  107.959853] R13: 00007ffeeef60620 R14: 0000000000000000 R15: 0000000000000000
    [  107.960581] nova error: 
    [  107.960585] do_dax_mapping_read: nova data checksum and recovery fail! inode 33, offset 0, entry pgoff 0, 1 pages, pgoff 0
Andiry commented 3 years ago

Can you elaborate the repro steps a little bit? I have mounted NOVA with metdata_csum, data_csum, and data_parity, and run test2.cpp and there is no error.

hayley-leblanc commented 3 years ago

I'm working with NOVA on a QEMU/KVM virtual machine with 1 core running Ubuntu 20.04. I wonder if it's an artifact of the specific setup I'm using. I'll mess around with the setup and configurations a bit and get back to you.

hayley-leblanc commented 2 years ago

Here are more details about my setup and the steps to reproduce:

Setup: QEMU/KVM virtual machine running Ubuntu 20.04 and Linux kernel 5.1.0+ with KASAN. I initially ran into the issue on a VM with 8GB RAM, 128MB emulated persistent memory, and 1 core. I've also tried it out on 4GB of persistent memory and 4 cores and still run into the same issue. I'm building NOVA as a loadable module and manually loading it using insmod.

Steps to reproduce: I experimented a bit and found that both issues can be reproduced with only data_csum.

  1. Load NOVA with insmod path/to/nova/nova.ko data_csum=1
  2. Mount a fresh instance of NOVA with mount -t NOVA -o init /dev/pmem0 /mnt/pmem
  3. Run test2.cpp

I dug around in the code a bit and the second issue (the one causing pread() to fail for me) seems to occur in nova_verify_data_csum() when it's called during the pread(). The problem specifically only arises when the pread() is close to the size of the whole file; it seems to specifically be a problem with reading the very last stripe. I modified the for loop on line 758 of checksum.c to only iterate over strps-1 stripes and that allows pread() to complete successfully, but then it's not actually verifying everything. The KASAN bug appears to be hit on line 572 of checksum.c, in nova_update_stripe_csum(), but I don't know exactly why and I don't have a workaround for that one.

Andiry commented 2 years ago

I have tried again and cannot reproduce on a bare-metal machine. I do suspect this is related with VM and I have seen people reported issues on VM before. I think there are some kernel flags related to memory need to be disabled for NOVA running on VM, but I can't recall them - perhaps they are mentioned in some other issues. Can you try on a bare-metal machine to see if it is reproducible?

hayley-leblanc commented 2 years ago

I tried it on a bare-metal machine and was not able to replicate the issue there. I compared the execution on bare-metal vs. the VM and found that the issue comes from line 174 of parity.c: https://github.com/NVSL/linux-nova/blob/b817ca322e6fc61f532174e7effc4b6c81528e3f/fs/nova/parity.c#L174

unroll_csum is set to 0 on my VM and 1 on my bare-metal machine. This led to the following if statement evaluating to true on the VM but not on baremetal: https://github.com/NVSL/linux-nova/blob/b817ca322e6fc61f532174e7effc4b6c81528e3f/fs/nova/parity.c#L248-L250 The two problems described in this issue only occur when the contents of the if statement are run, which is why they aren't showing up on baremetal.

The root of the problem on the VM seems to be that by default, the virtualized CPU does not have the X86_FEATURE_XMM4_2 feature flag that is used to set unroll_csum. I was able to fix this on my VM by adding -cpu host to the qemu command that starts the VM.

This still seems like an bug to me, since it does appear that NOVA is intended to support CPUs without this feature.