NVSL / linux-nova

NOVA is a log-structured file system designed for byte-addressable non-volatile memories, developed at the University of California, San Diego.
http://nvsl.ucsd.edu/index.php?path=projects/nova
Other
422 stars 117 forks source link

Possible crash consistency bug in truncate() in metadata_csum mode #127

Closed hayley-leblanc closed 2 years ago

hayley-leblanc commented 2 years ago

Hi Andiry,

I believe I've found a crash consistency bug that manifests in the truncate() system call in metadata_csum mode. Suppose we have the following sequence of operations in a fresh NOVA instance mounted on /mnt/pmem:

creat /mnt/pmem/foo
ln /mnt/pmem/foo /mnt/pmem/bar
truncate /mnt/pmem/foo to 0 bytes

If we crash during the truncate operation, it it appears that foo's checksums get messed up. In attempting to repair them, NOVA ends up reverting or corrupting the link and leaving foo with a link count of 1 in the recovered state. The problematic crash states arise from a crash during nova_invalidate_reassign_logentry() (called by nova_invalidate_write_entry(), which is called by nova_free_old_entry() in nova_delete_file_tree()). In this function, the primary log entry is updated first by nova_update_entry_csum(), followed by the alternate log entry using nova_update_alter_entry(). There is no store fence between these operations.

If we crash after updating the alternate entry but before updating the primary entry, NOVA seems to be unable to repair the issue correctly. Mounting such a crash state gives the following errors:

[  120.164574] nova error: 
[  120.164576] nova_verify_entry_csum: both entry and its replica fail checksum verification
[  120.165880] nova error: 
[  120.165882] nova_verify_entry_csum: unable to repair entry errors

as does attempting to stat foo. However, the stat DOES succeed, and gives foo's link count as 1.

The issue also appears to manifest if we replace the link with other operations; specifically, it appears that if we truncate foo to a larger size and write to it before the crashing truncate, NOVA will throw the checksum verification errors and revert back to a state before the write.

I believe the solution is to add a fence to the call to nova_flush_buffer() at the end of nova_update_entry_csum(); this will ensure that the primary entry's checksum is updated before the alternate's. In my experiments, this resolves the issue. Happy to make a PR if you'd like.

Thanks!