Closed hayley-leblanc closed 3 years ago
Hi Hayley,
Thanks for the findings!
1), I assume you are referring to nova_append_dir_init_entries()? There used to be a nova_flush_buffer(). However, when implementing NOVA-fortis, I guess it returns early when metadata_csum==0. So a nova_flush_buffer() is still needed.
2) Yes you are right.
3) I think the logic is:
nova_create_inode_transaction(); // journal pi->valid pi->valid = 1; nova_commit_lite_transaction();
So if crash occurs before the transaction commits, it should be restored to 0. However, if the transaction succeeds, pi->valid is not flushed. Does it comply with your analysis? Also, for nova_flush_buffer() 1 byte should work as expected, but in other places we usually flushes CACHELINE_SIZE.
I really appreciate your help and expect your PR.
Thanks! For 1), yes, the bug is in nova_append_dir_init_entries()
. On 3) - yes, that matches with my findings.
I'll submit a PR shortly.
Hi NOVA folks,
I believe I have found a few crash consistency bugs in NOVA. They are all related to
mkdir()
. These were found by recording CLWB/CLFLUSHs, SFENCEs, and non-temporal stores, then re-creating the state of the file system immediately after a crash.Let's say we create a new directory named A/, fsync A/, and then crash.
The
log_head
field for the in-NVM inode for the A/ is never explicitly persisted, which can leave the inode with a null log head pointer. Attempts to read the parent directory or the new directory in the recreated crash state getnova error: Dir 34 log is NULL!
messages. Also, NOVA allows the creation of new files within A/ with no error, but those files are then inaccessible. I believe this bug can be fixed by adding a call tonova_flush_buffer(&(pi->log_head), CACHELINE_SIZE, 0)
tonova_append_dir_entries()
in dir.c.This crash can also potentially cause subsequent attempts to create files to fail. It seems as though NOVA attempts to assign new files the same inode number as was given to A/, which causes the file creation to fail. Attempting to create a new file in A/ after the crash gives the following error:
My testing tool doesn't find this bug anymore if I add
nova_flush_buffer(pi, sizeof(struct nova_inode), 0)
innova_free_inode_log()
at the end of theif (pi)
if statement.A/ can also be inaccessible and undeletable if the write to its inode's
valid
field innova_lite_transaction_for_new_inode()
doesn't make it to NVM by the time the crash occurs. Runningls -l
in A's parent directory in the recreated crash state gives the following output:and
dmesg
gives the following errors:Attempting to delete A also fails. I think this bug can be fixed by adding
nova_flush_buffer(&pi->valid, 1, 0)
innova_lite_transaction_for_new_inode()
, immediately after settingpi->valid = 1
.I'm happy to make a PR with my suggested additions, or a subset of them, if y'all would like. They fix these three issues in the specific
mkdir; fsync; crash
test case outlined above, but that's the only scenario in which I've tested them.Thanks!