Crash consistency issue with truncate in data_csum mode

hayley-leblanc commented 3 years ago

Hi Andiry,

I believe I've found a crash consistency issue with the truncate() system call in NOVA's data_csum mode. It can be replicated using the following steps:

Add a line goto out; around line 1330 of inode.c (right after nova_handle_setattr_operation() in nova_notify_change()). This emulates a crash that prevents nova_setsize() from running.
Load NOVA with data_csum=1 and mount it at /mnt/pmem
Run a program that performs the following operations: create a file /mnt/pmem/foo, write 32 bytes to foo, truncate foo to 1 byte.
Use dd to copy the contents of the PM device to a separate file
Unmount NOVA and use dd to load the contents of the separate file back onto the PM device
Mount NOVA and try to read foo using cat /mnt/pmem/foo

The attempt to read foo gives an input/output error and NOVA outputs the following error logs:

[  150.564193] nova: nova_verify_data_csum: nova data corruption detected! inode 33, strp 0 of 1, block offset 18329600, stripe nr 35800, csum calc 0x17615e49, csum nvmm 0xbd6f81f8, csum nvmm replica 0xbd6f81f8
[  150.570471] nova: nova_verify_data_csum: no data redundancy available, can not repair data corruption!
[  150.571872] nova error: 
[  150.571878] do_dax_mapping_read: nova data checksum and recovery fail! inode 33, offset 0, entry pgoff 0, 1 pages, pgoff 0

As far as I have been able to tell, this issue seems to occur if we crash at any point after updating the tail pointer in nova_update_inode() (called by nova_handle_setattr_operation()) and before handling checksums in nova_update_truncated_block_csum(). I don't know the exact root cause or have a fix for this, but I'll take another look when I get a chance.

Thanks!

hayley-leblanc commented 3 years ago

I spent some time digging into this issue and have a bit more information. First, I think the specific error output I reported above might stem from the same issues described in #126. However, I think there may be a separate issue here impacting crash consistency in truncate.

In order to see this with the example program described above, I added print statements to nova_update_stripe_csum() so that the checksum calculated during the write() is printed. On my machine, that checksum is 0xbd6f81f8. When I inject the crash and remount the file system, stat /mnt/pmem/file0 gives

  File: /mnt/pmem/file0
  Size: 1           Blocks: 0          IO Block: 4096   regular file
Device: 10300h/66304d   Inode: 33          Links: 1
Access: (0755/-rwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2021-12-03 22:36:05.000000000 +0000
Modify: 2021-12-03 22:36:05.000000000 +0000
Change: 2021-12-03 22:36:05.000000000 +0000
 Birth: -

i.e., the truncate has at least partially gone through because the size of the file has been updated. As shown above, cat /mnt/pmem/file0 shows that file0's calculated checksum is now 0x17615e49, but the stored checksum is still 0xbd6f81f8, which is why the checksum verification fails. It appears that the truncation operation is not atomic with the checksum updates, which causes the error here and makes the truncated file unreadable. I also tried to figure out where this bug STOPS occurring (i.e., is there a point in truncate() after which crashes do not cause this issue) and it looks like the issue is resolved when nova_update_truncated_block_csum() gets called (by nova_clear_last_page tail(), which is called by nova_setsize()) - i.e, when file0's data checksums are brought into line with its other modifications.

Unfortunately, I don't have a fix or workaround for this issue; I think it could be pretty tricky to fix. The operations that would need to become atomic span multiple function calls (they start in nova_handle_setattr_operation() and finish in nova_setsize() and there doesn't appear to already be a truncate transaction that they could be added to.

Andiry commented 2 years ago

Thanks for the report. Is there an easy way to reproduce it? The program you use, etc. Is the dd step required in step 4 and 5?

hayley-leblanc commented 2 years ago

I went back and tested it out and you don't need dd (although dd is a kernel copying utility that should be installed by default - sorry for the lack of clarity on that part). The bug should occur if you just add the goto from step 1 to emulate a crash, mount NOVA in data_csum mode, run the program described in step 3, and then try to read the file.

Here is the program I am using in step 3 to make this bug manifest: test4.zip. It creates a file called file0 on NOVA, and trying to read it after following steps 1-3 should give the checksum verification error.

Andiry commented 2 years ago

I cannot reproduce the error with test4.cpp. After umount and mount, cat the file shows "a", without errors in dmesg. Is this related to VM setup? Can you reproduce the issue on a bare-metal machine?

hayley-leblanc commented 2 years ago

I looked into this a bit more (although it still needs some more investigation). Like in #126, I was not able to reproduce the issue on baremetal and I was able to resolve it on a VM by using QEMU's -cpu host flag. I haven't had a chance to look at exactly what is different from the original VM setup vs. baremetal/with -cpu host, but I assume it's something similar to what I observed for the issue in #126. I can spend some more time looking into it if you consider this a real bug.

Andiry commented 2 years ago

Thanks. I have seen that people encountering issues with NOVA on VM, and some flags help to workaround. I am not sure what the issue really is, but if you find out I am happy to apply.

NVSL / linux-nova

Crash consistency issue with truncate in data_csum mode #131