I believe there is a crash consistency issue with link and unlink in metadata_csum mode. Suppose we run the following operations on a fresh NOVA file system mounted at /mnt/pmem:
creat /mnt/pmem/foo
ln /mnt/pmem/foo /mnt/pmem/bar
rm /mnt/pmem/bar
Based on the crash consistency tester I'm working on, it appears that if we crash during the remove operation, the unlink may not complete atomically. I have not dug into the root cause a huge amount, but it appears that the occurs at the beginning of nova_unlink(), if only a write to the primary inode in nova_append_link_change_entry() becomes persistent and none of the other preceding writes do. In the resulting crash state, /mnt/pmem/bar is present, and attempting to access it gives the following log message:
nova: nova_verify_entry_csum: entry replica 00000000d95b18c8 error, trying to repair using the primary
nova: nova_repair_entry: entry error repaired
nova: Failure recovery total recovered 2
However, the repaired version of /mnt/pmem/bar has a link count of 1, but /mnt/pmem/foo also exists with the same inode number.
I believe this issue can be fixed by just adding a store fence to the beginning of nova_append_link_change_entry() (or, if we don't want a store fence there, adding it directly before that call in nova_unlink()). Happy to make a PR if you think this is a bug. Thanks!
Edit: this issue feels similar to the one I reported in #119, but adding the store fence to nova_append_link_change_entry() does not fix that problem. I believe that issue stems from the linking logic, whereas this is just a PM programming mistake.
Hi,
I believe there is a crash consistency issue with link and unlink in metadata_csum mode. Suppose we run the following operations on a fresh NOVA file system mounted at /mnt/pmem:
Based on the crash consistency tester I'm working on, it appears that if we crash during the remove operation, the unlink may not complete atomically. I have not dug into the root cause a huge amount, but it appears that the occurs at the beginning of
nova_unlink()
, if only a write to the primary inode innova_append_link_change_entry()
becomes persistent and none of the other preceding writes do. In the resulting crash state,/mnt/pmem/bar
is present, and attempting to access it gives the following log message:However, the repaired version of
/mnt/pmem/bar
has a link count of 1, but/mnt/pmem/foo
also exists with the same inode number.I believe this issue can be fixed by just adding a store fence to the beginning of
nova_append_link_change_entry()
(or, if we don't want a store fence there, adding it directly before that call innova_unlink()
). Happy to make a PR if you think this is a bug. Thanks!Edit: this issue feels similar to the one I reported in #119, but adding the store fence to
nova_append_link_change_entry()
does not fix that problem. I believe that issue stems from the linking logic, whereas this is just a PM programming mistake.