NVSL / linux-nova

NOVA is a log-structured file system designed for byte-addressable non-volatile memories, developed at the University of California, San Diego.
http://nvsl.ucsd.edu/index.php?path=projects/nova
Other
421 stars 118 forks source link

Crash consistency bug + question about nova_flush_buffer use #94

Closed hayley-leblanc closed 3 years ago

hayley-leblanc commented 3 years ago

Hi,

Two things in this issue - an instance of a bug I discovered and some potential locations of bugs.

I have a potential crash consistency bug in NOVA's implementation of rename(). I believe if the operation involves creating a new inode, nova_rename() calls nova_append_link_change_entry(); this calls nova_append_log_entry(), which calls nova_get_append_head(), which calls nova_extend_inode_log(), which calls nova_initialize_inode_log(). nova_initialize_inode_log() contains the following code:

if (log_id == MAIN_LOG) {
    pi->log_tail = new_block;
    nova_flush_buffer(&pi->log_tail, CACHELINE_SIZE, 0);
    pi->log_head = new_block;
    sih->log_head = sih->log_tail = new_block;
    sih->log_pages = 1;
    nova_flush_buffer(&pi->log_head, CACHELINE_SIZE, 1);
}

I injected a crash between the first and third lines of this if statement such that the new value of pi->log_tail is persisted, but the value of pi->log_head is not. When I tried to mount the resulting file system image, the kernel crashed with a NULL pointer dereference. I believe the issue is in goto_next_page with the following lines:

addr = nova_get_block(sb, curr_p);
rc = memcpy_mcsafe(&type, addr, sizeof(u8));

because addr is set to 0 when this bug is hit.

I triggered this bug with the following workload: mkdir(A), creat(A/foo), close(A/foo), rename(A/foo, A/bar) with the crash injected in the rename() call. Can you confirm this bug? I don't currently have a patch but I'll try to make a PR if I get an opportunity to write one.

I also have a couple questions about some places in NOVA that appear to be missing calls to nova_flush_buffer() even though they involve modifying persistent inode fields. I don't have instances of bugs related to any of these at the moment, but I'm curious whether these may be bugs or whether the lack of nova_flush_buffer() is intentional (or if I just missed subsequent flushes elsewhere in the code :) )

Thanks!

Andiry commented 3 years ago

Thanks for the mail.

For the rename issue, note that NOVA updates tail pointer before head pointer. During the recovery process, the recovery thread should skip if it finds the head pointer is 0, meaning there is no log allocated. I am not quite sure why the rebuild process triggers NULL pointer in this case. More importantly, the rename() is protected by transaction. As the crash occurs during the rename() process, the recovery should provoke the transaction and restores the system back to the state before rename() begins. Better to check the transaction part to figure out if there is a bug (the rename process is very complicated. I will not be surprised if there is a bug).

For the nova_flush_buffer() issue, some of them do look like a bug to me. I can't recall the history, but it is possible that when we added NOVA-Fortis support, we replaced some nova_flush_buffer() with nova_update_inode_checksum(). However, nova_update_inode_checksum() skips flushing if metadata_csum is disabled. So perhaps we should fix nova_update_inode_checksum(), or add nova_flush_buffer() explicitly.

Thanks, Andiry

On Fri, Jan 29, 2021 at 12:11 PM hayley-leblanc notifications@github.com wrote:

Hi,

Two things in this issue - an instance of a bug I discovered and some potential locations of bugs.

I have a potential crash consistency bug in NOVA's implementation of rename(). I believe if the operation involves creating a new inode, nova_rename() calls nova_append_link_change_entry(); this calls nova_append_log_entry(), which calls nova_get_append_head(), which calls nova_extend_inode_log(), which calls nova_initialize_inode_log(). nova_initialize_inode_log() contains the following code:

if (log_id == MAIN_LOG) { pi->log_tail = new_block; nova_flush_buffer(&pi->log_tail, CACHELINE_SIZE, 0); pi->log_head = new_block; sih->log_head = sih->log_tail = new_block; sih->log_pages = 1; nova_flush_buffer(&pi->log_head, CACHELINE_SIZE, 1); }

I injected a crash between the first and third lines of this if statement such that the new value of pi->log_tail is persisted, but the value of pi->log_head is not. When I tried to mount the resulting file system image, the kernel crashed with a NULL pointer dereference. I believe the issue is in goto_next_page with the following lines:

addr = nova_get_block(sb, curr_p); rc = memcpy_mcsafe(&type, addr, sizeof(u8));

because addr is set to 0 when this bug is hit.

I triggered this bug with the following workload: mkdir(A), creat(A/foo), close(A/foo), rename(A/foo, A/bar) with the crash injected in the rename() call. Can you confirm this bug? I don't currently have a patch but I'll try to make a PR if I get an opportunity to write one.

I also have a couple questions about some places in NOVA that appear to be missing calls to nova_flush_buffer() even though they involve modifying persistent inode fields. I don't have instances of bugs related to any of these at the moment, but I'm curious whether these may be bugs or whether the lack of nova_flush_buffer() is intentional (or if I just missed subsequent flushes elsewhere in the code :) )

  • nova_free_inode_resource() in inode.c modifies pi->deleted and pi->valid without flushing pi
  • nova_fallocate() in file.c modifies pi->i_flags on line 330 without flushing it
  • nova_init_inode_table() in inode.c modifies of the inode table inode without flushing it
  • check_eof_blocks() in inode.c modifies pi->i_flags on line 380 without flushing it
  • nova_lite_transaction_for_time_and_link() in namei.c sets a couple pi fields around line 321 without flushing
  • nova_update_inode_with_rebuild() in rebuild.c sets a bunch of persistent inode fields without flushing them
  • nova_new_vfs_inode() in inode.c sets some persistent inode fields around line 1135 without flushing or fencing, although there's a comment indicating that they should be flushed.

Thanks!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/NVSL/linux-nova/issues/94__;!!Mih3wA!XtUj5sGGwTeoG5u0TThnJ3Ld-91c6HIR8aZVBTR3ZsTxmuwxX1rVP3QelDruWec6Yk3N$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AAKBYEFC66GNAW3CUIETZXLS4MI55ANCNFSM4WZN7UMQ__;!!Mih3wA!XtUj5sGGwTeoG5u0TThnJ3Ld-91c6HIR8aZVBTR3ZsTxmuwxX1rVP3QelDruWQRwFgAe$ .

hayley-leblanc commented 3 years ago

Hi Andiry,

Thanks for your response! I took a look at nova_rename() and it looks like nova_create_rename_transaction() is called after the call to nova_append_link_change_entry() that appears buggy to me. Would this mean that the log update I'm finding to be buggy is not actually part of the rename transaction?

Andiry commented 3 years ago

Yes, nova_append_link_change_entry is outside nova_create_rename_transaction. We first append log entries to each inode impacted, then use nova_create_rename_transaction to atomically update all the inodes. When crash before nova_create_rename_transaction(), the logs do not have the tail pointer updated and hence the new appended entries are ignored.

However, nova_rename() may create a new inode. Here we should set new_pi->valid = 0 to make sure it is invalid at the moment. Then after reboot, the rebuild process should ignore this inode, and the inode's log should not be traversed. I think nova_rebuild_inode() should check pi->valid field, but turns out it only checks pi->deleted. I can't recall the logic here as it has been several years...

Thanks, Andiry

On Tue, Feb 2, 2021 at 12:51 AM hayley-leblanc notifications@github.com wrote:

Hi Andiry,

Thanks for your response! I took a look at nova_rename() and it looks like nova_create_rename_transaction() is called after the call to nova_append_link_change_entry() that appears buggy to me. Would this mean that the log update I'm finding to be buggy is not actually part of the rename transaction?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/NVSL/linux-nova/issues/94*issuecomment-771473783__;Iw!!Mih3wA!VLoMsvKx4QvMkrusfMuLRsZYe5ZGjmjml5XCOcrE6m13vKIqygVjTm-BzAXnMwp7p3w2$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AAKBYEE46MSUVYYTQ7ADHITS464HHANCNFSM4WZN7UMQ__;!!Mih3wA!VLoMsvKx4QvMkrusfMuLRsZYe5ZGjmjml5XCOcrE6m13vKIqygVjTm-BzAXnM2_pHTzM$ .

hayley-leblanc commented 3 years ago

I may have figured out the problem. My initial take on the issue was wrong - the null pointer deref is triggered when I add a crash during the call to nova_append_link_change_entry() on the old inode, not a new one. Sorry for the confusion there. I also didn't describe exactly what I did to trigger this bug; I found it by starting from an empty file system and creating a directory, then creating a file in that directory, and trying to rename the file. I think this bug can only be triggered if the file being renamed has just been created and not modified/written to yet, because it involves a crash during log initialization. It seems to me right now that the recovery code works fine as long as the old inode has a log to append to in the transaction, but if we crash while the log is being set up it doesn't work right - does that seem possible?

What I'm seeing happen is, if we rename a brand-new file, it results in the inode's log initialized in nova_initialize_inode_log(). If we crash during that function such that the inode's new log tail pointer has been persisted but the head tail pointer hasn't, we end up with one null log pointer and one non-null one. The recovery code in nova_traverse_file_inode_log() expects that either both log pointer fields will be null, or they'll both be non-null (it checks the condition if (curr_p == 0 && pi->log_tail == 0) where curr_p is initially a pointer to pi->log_head). So if it finds that they are not both null, it attempts to traverse the log, which is when the null pointer deref occurs, since one of the log pointers is null. nova_rebuild_file_inode_tree has the same pattern which will also trigger a null pointer deref if the one in nova_traverse_file_inode_log() is fixed.

I was able to prevent the null pointer dereference in recovery by changing the if (curr_p == 0 && pi->log_tail == 0) condition in nova_traverse_file_inode_log() (and the similar one in nova_rebuild_file_inode_tree() to if (curr_p == 0 || pi->log_tail == 0) but it looks like the recovery code still leaves the inode with mismatched log pointers, which probably would need to be fixed to prevent something else from going wrong later.

Andiry commented 3 years ago

Thanks for the analysis. It does sound like a bug. We should skip the log traverse if either pointer is nullptr. Perhaps change the condition as you suggested, and print out a warning message and set both pointers to nullptr in this case?

Thanks, Andiry

On Wed, Feb 10, 2021 at 3:17 PM hayley-leblanc notifications@github.com wrote:

I may have figured out the problem. My initial take on the issue was wrong

  • the null pointer deref is triggered when I add a crash during the call to nova_append_link_change_entry() on the old inode, not a new one. Sorry for the confusion there. I also didn't describe exactly what I did to trigger this bug; I found it by starting from an empty file system and creating a directory, then creating a file in that directory, and trying to rename the file. I think this bug can only be triggered if the file being renamed has just been created and not modified/written to yet, because it involves a crash during log initialization. It seems to me right now that the recovery code works fine as long as the old inode has a log to append to in the transaction, but if we crash while the log is being set up it doesn't work right - does that seem possible?

What I'm seeing happen is, if we rename a brand-new file, it results in the inode's log initialized in nova_initialize_inode_log(). If we crash during that function such that the inode's new log tail pointer has been persisted but the head tail pointer hasn't, we end up with one null log pointer and one non-null one. The recovery code in nova_traverse_file_inode_log() expects that either both log pointer fields will be null, or they'll both be non-null (it checks the condition if (curr_p == 0 && pi->log_tail == 0) where curr_p is initially a pointer to pi->log_head). So if it finds that they are not both null, it attempts to traverse the log, which is when the null pointer deref occurs, since one of the log pointers is null. nova_rebuild_file_inode_tree has the same pattern which will also trigger a null pointer deref if the one in nova_traverse_file_inode_log() is fixed.

I was able to prevent the null pointer dereference in recovery by changing the if (curr_p == 0 && pi->log_tail == 0) condition in nova_traverse_file_inode_log() (and the similar one in nova_rebuild_file_inode_tree() to if (curr_p == 0 || pi->log_tail == 0) but it looks like the recovery code still leaves the inode with mismatched log pointers, which probably would need to be fixed to prevent something else from going wrong later.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/NVSL/linux-nova/issues/94*issuecomment-777101119__;Iw!!Mih3wA!WDQb1iEhyUUwjHRXNjO2p1b68_zFVpL-TgH7h6VPgsO6j7FlF_ZHyQsNnzPwgKrlsCp6$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AAKBYEHMBBTLFV6OUNVEGPDS6MHXLANCNFSM4WZN7UMQ__;!!Mih3wA!WDQb1iEhyUUwjHRXNjO2p1b68_zFVpL-TgH7h6VPgsO6j7FlF_ZHyQsNnzPwgKLxiMm4$ .