NVSL / linux-nova

NOVA is a log-structured file system designed for byte-addressable non-volatile memories, developed at the University of California, San Diego.
http://nvsl.ucsd.edu/index.php?path=projects/nova
Other
422 stars 117 forks source link

Unreadable file when crashing during append (checksums+parity enabled) #121

Open pcworld opened 2 years ago

pcworld commented 2 years ago

This report assumes that NOVA's protection features are enabled (nova.metadata_csum=1 nova.data_csum=1 nova.data_parity=1).

For setup, I have created a small file: echo -n test > /mnt/myfile && sync In the following, I assume that a crash occurs during appending to the file (write system call): echo -n testtesttesttesttesttest >> /mnt/myfile

I think there is a crash consistency bug that leads to the file becoming irrecoverably broken after recovery. The file will be empty and listed with size 0. nova_verify_entry_csum outputs the following error messages during mount and when listing(?) the directory: "both entry and its replica fail checksum verification" and "unable to repair entry errors". However, it appears that from user space perspective, the involved file operations do not return error codes, so an application reading the file assumes everything is in order and that the file would be empty. The file can still be written to, and its new contents are also readable, but after remounting the file system, it becomes empty again.

I think one of the possibilites that leads to this state is the following crash scenario: When data_csum || data_parity is true, nova_set_write_entry_updating is called in do_nova_inplace_file_write. In nova_set_write_entry_updating, if metadata_csum is true, the following happens (excerpt): https://github.com/NVSL/linux-nova/blob/593f927a78a6900d7cfec58199fb0a4a4fd1d646/fs/nova/log.c#L871-L873 I consider the crash state where entry->updating = 1 is persisted, but the checksum updated by nova_update_entry_csum(entry) is not persisted (that function does not call a fence), and further, the alter_entry updated by nova_update_alter_entry is only partially persisted by crashing during memcpy_to_pmem_nocache: https://github.com/NVSL/linux-nova/blob/593f927a78a6900d7cfec58199fb0a4a4fd1d646/fs/nova/checksum.c#L236 I assume that this leads to the crash state explained above.

I think there's also a second crash that leads to the same bug, namely in do_nova_inplace_file_write (dax.c l. 744) -> nova_inplace_update_write_entry (log.c l. 862) -> nova_inplace_update_log_entry, where if metadata_csum is set, the following is performed: https://github.com/NVSL/linux-nova/blob/593f927a78a6900d7cfec58199fb0a4a4fd1d646/fs/nova/log.c#L473-L480 When one follows nova_update_log_entry to nova_update_write_entry and there, the checksum update is not persisted, and again if there's a crash in the middle of copying the alter_entry updated by nova_update_alter_entry (this time called from the nova_inplace_update_log_entry snippet above), I think this leads to the same or similar crash state as outlined above.

Maybe making nova_update_entry_csum perform a fence at the end would fix these bugs, but I'm not sure if this helps in every case, and also I'm not sure if that would have serious performance implications. I will test if adding that fence will at least fix the crash paths that I have observed so far leading to this bug.

I can imagine that my report is probably difficult to follow, I could try to post a repro in the future, or upload a crash image that exhibits this bug.

Andiry commented 2 years ago

Not sure about the second case, but for the first case, if entry->updating is set, during recovery in nova_rebuild_handle_write_entry() we will reset the csum parity. Is there a way to reproduce it?

pcworld commented 2 years ago

The following patch on top of 593f927a78a6900d7cfec58199fb0a4a4fd1d646 can be used to reproduce the first case (should also work on master):

diff --git a/fs/nova/log.c b/fs/nova/log.c
index 750ff47d9124..8c5dd218b715 100644
--- a/fs/nova/log.c
+++ b/fs/nova/log.c
@@ -869,7 +869,6 @@ int nova_set_write_entry_updating(struct super_block *sb,
    unsigned long irq_flags = 0;
    nova_memunlock_range(sb, entry, sizeof(*entry), &irq_flags);
    entry->updating = set ? 1 : 0;
-   nova_update_entry_csum(entry);
    nova_update_alter_entry(sb, entry);
    nova_memlock_range(sb, entry, sizeof(*entry), &irq_flags);

So it can be reproduced a bit simpler than I originally wrote.

With this patch applied, the following shell commands reproduce the error:

/ # set -ex
/ # mount -tNOVA -oinit /dev/pmem0 /mnt && echo -n test > /mnt/myfile && sync &&
 echo -n testtesttesttesttesttest >> /mnt/myfile && echo success
+ mount -tNOVA -oinit /dev/pmem0 /mnt
nova: 1 cpus online
nova: nova_get_nvmm_info: dev pmem0, phys_addr 0x8000000, virt_addr 0xffff888008000000, size 10485760
nova: measure timing 0, metadata checksum 1, wprotect 0, data checksum 1, data parity 1, DRAM checksum 0
nova: Start NOVA snapshot cleaner thread.
nova: creating an empty nova of size 10485760
nova: NOVA initialization finish
nova: Current epoch id: 0
nova: Running snapshot cleaner thread
+ echo -n test
+ sync
+ echo -n testtesttesttesttesttest
nova error: 
nova_verify_entry_csum: both entry and its replica fail checksum verification
nova error: 
nova_verify_entry_csum: unable to repair entry errors
nova error: 
nova_verify_entry_csum: both entry and its replica fail checksum verification
nova error: 
nova_verify_entry_csum: unable to repair entry errors
nova: do_nova_inplace_file_write alloc blocks failed!, -22
/ # cat /mnt/myfile 
+ cat /mnt/myfile
nova error: 
nova_verify_entry_csum: both entry and its replica fail checksum verification
nova error: 
nova_verify_entry_csum: unable to repair entry errors
cat: read error: Input/output error

This bug is probably fixed by PR #129. But maybe it still helps you find another issue if the recovery code isn't working as intended as you suggest (re "if entry->updating is set, during recovery in nova_rebuild_handle_write_entry() we will reset the csum parity").

PS, I intend to respond on the other issues I have reported where you have asked for clarification, might still take a while though.

pcworld commented 2 years ago

The following further shows how the file can still be written to, and its contents can then be read, but after remounting the file system, the file becomes unreadable again (commands should be run after the shell commands from my previous comment):

/ # echo ho > /mnt/myfile
nova error: 
nova_verify_entry_csum: both entry and its replica fail checksum verification
nova error: 
nova_verify_entry_csum: unable to repair entry errors
/ # cat /mnt/myfile
ho
/ # umount /mnt
nova: Current epoch id: 0
nova: nova_save_inode_list_to_log: 1 inode nodes, pi head 0x375000, tail 0x375010
nova: nova_save_blocknode_mappings_to_log: 1 blocknodes, 1 log pages, pi head 0x376000, tail 0x376010
/ # mount -tNOVA /dev/pmem0 /mnt
nova: 1 cpus online
nova: nova_get_nvmm_info: dev pmem0, phys_addr 0x8000000, virt_addr 0xffff888008000000, size 10485760
nova: measure timing 0, metadata checksum 1, wprotect 0, data checksum 1, data parity 1, DRAM checksum 0
nova: Start NOVA snapshot cleaner thread.
nova: nova_init_inode_list_from_inode: 1 inode nodes
nova: Recovered 0 snapshots, latest epoch ID 0
nova: NOVA: Normal shutdown
nova: Current epoch id: 0
nova: Running snapshot cleaner thread
/ # cat /mnt/myfile
nova error: 
nova_verify_entry_csum: both entry and its replica fail checksum verification
nova error: 
nova_verify_entry_csum: unable to repair entry errors

I think this can be considered a bug in any case, even if the checksum persistence is fixed. If the file system is so broken that files cannot be read again after a remount, the corresponding write calls should fail, or alternatively, recovery should be successful. This issue could not only result from a crash, but also from other kinds of data corruption.

pcworld commented 2 years ago

This bug is probably fixed by PR #129.

I can not reproduce the I/O error anymore with PR #129. Leaving the issue open though for the reasons stated in my last two comments.