I believe I've found a crash consistency bug that manifests in the truncate() system call in metadata_csum mode. Suppose we have the following sequence of operations in a fresh NOVA instance mounted on /mnt/pmem:
creat /mnt/pmem/foo
ln /mnt/pmem/foo /mnt/pmem/bar
truncate /mnt/pmem/foo to 0 bytes
If we crash during the truncate operation, it it appears that foo's checksums get messed up. In attempting to repair them, NOVA ends up reverting or corrupting the link and leaving foo with a link count of 1 in the recovered state. The problematic crash states arise from a crash during nova_invalidate_reassign_logentry() (called by nova_invalidate_write_entry(), which is called by nova_free_old_entry() in nova_delete_file_tree()). In this function, the primary log entry is updated first by nova_update_entry_csum(), followed by the alternate log entry using nova_update_alter_entry(). There is no store fence between these operations.
If we crash after updating the alternate entry but before updating the primary entry, NOVA seems to be unable to repair the issue correctly. Mounting such a crash state gives the following errors:
[ 120.164574] nova error:
[ 120.164576] nova_verify_entry_csum: both entry and its replica fail checksum verification
[ 120.165880] nova error:
[ 120.165882] nova_verify_entry_csum: unable to repair entry errors
as does attempting to stat foo. However, the stat DOES succeed, and gives foo's link count as 1.
The issue also appears to manifest if we replace the link with other operations; specifically, it appears that if we truncate foo to a larger size and write to it before the crashing truncate, NOVA will throw the checksum verification errors and revert back to a state before the write.
I believe the solution is to add a fence to the call to nova_flush_buffer() at the end of nova_update_entry_csum(); this will ensure that the primary entry's checksum is updated before the alternate's. In my experiments, this resolves the issue. Happy to make a PR if you'd like.
Hi Andiry,
I believe I've found a crash consistency bug that manifests in the
truncate()
system call in metadata_csum mode. Suppose we have the following sequence of operations in a fresh NOVA instance mounted on /mnt/pmem:If we crash during the truncate operation, it it appears that foo's checksums get messed up. In attempting to repair them, NOVA ends up reverting or corrupting the link and leaving foo with a link count of 1 in the recovered state. The problematic crash states arise from a crash during
nova_invalidate_reassign_logentry()
(called bynova_invalidate_write_entry()
, which is called bynova_free_old_entry()
innova_delete_file_tree()
). In this function, the primary log entry is updated first bynova_update_entry_csum()
, followed by the alternate log entry usingnova_update_alter_entry()
. There is no store fence between these operations.If we crash after updating the alternate entry but before updating the primary entry, NOVA seems to be unable to repair the issue correctly. Mounting such a crash state gives the following errors:
as does attempting to
stat
foo. However, thestat
DOES succeed, and gives foo's link count as 1.The issue also appears to manifest if we replace the link with other operations; specifically, it appears that if we truncate foo to a larger size and write to it before the crashing truncate, NOVA will throw the checksum verification errors and revert back to a state before the write.
I believe the solution is to add a fence to the call to
nova_flush_buffer()
at the end ofnova_update_entry_csum()
; this will ensure that the primary entry's checksum is updated before the alternate's. In my experiments, this resolves the issue. Happy to make a PR if you'd like.Thanks!