Closed hayley-leblanc closed 3 years ago
Thanks for the mail.
For the rename issue, note that NOVA updates tail pointer before head pointer. During the recovery process, the recovery thread should skip if it finds the head pointer is 0, meaning there is no log allocated. I am not quite sure why the rebuild process triggers NULL pointer in this case. More importantly, the rename() is protected by transaction. As the crash occurs during the rename() process, the recovery should provoke the transaction and restores the system back to the state before rename() begins. Better to check the transaction part to figure out if there is a bug (the rename process is very complicated. I will not be surprised if there is a bug).
For the nova_flush_buffer() issue, some of them do look like a bug to me. I can't recall the history, but it is possible that when we added NOVA-Fortis support, we replaced some nova_flush_buffer() with nova_update_inode_checksum(). However, nova_update_inode_checksum() skips flushing if metadata_csum is disabled. So perhaps we should fix nova_update_inode_checksum(), or add nova_flush_buffer() explicitly.
Thanks, Andiry
On Fri, Jan 29, 2021 at 12:11 PM hayley-leblanc notifications@github.com wrote:
Hi,
Two things in this issue - an instance of a bug I discovered and some potential locations of bugs.
I have a potential crash consistency bug in NOVA's implementation of rename(). I believe if the operation involves creating a new inode, nova_rename() calls nova_append_link_change_entry(); this calls nova_append_log_entry(), which calls nova_get_append_head(), which calls nova_extend_inode_log(), which calls nova_initialize_inode_log(). nova_initialize_inode_log() contains the following code:
if (log_id == MAIN_LOG) { pi->log_tail = new_block; nova_flush_buffer(&pi->log_tail, CACHELINE_SIZE, 0); pi->log_head = new_block; sih->log_head = sih->log_tail = new_block; sih->log_pages = 1; nova_flush_buffer(&pi->log_head, CACHELINE_SIZE, 1); }
I injected a crash between the first and third lines of this if statement such that the new value of pi->log_tail is persisted, but the value of pi->log_head is not. When I tried to mount the resulting file system image, the kernel crashed with a NULL pointer dereference. I believe the issue is in goto_next_page with the following lines:
addr = nova_get_block(sb, curr_p); rc = memcpy_mcsafe(&type, addr, sizeof(u8));
because addr is set to 0 when this bug is hit.
I triggered this bug with the following workload: mkdir(A), creat(A/foo), close(A/foo), rename(A/foo, A/bar) with the crash injected in the rename() call. Can you confirm this bug? I don't currently have a patch but I'll try to make a PR if I get an opportunity to write one.
I also have a couple questions about some places in NOVA that appear to be missing calls to nova_flush_buffer() even though they involve modifying persistent inode fields. I don't have instances of bugs related to any of these at the moment, but I'm curious whether these may be bugs or whether the lack of nova_flush_buffer() is intentional (or if I just missed subsequent flushes elsewhere in the code :) )
- nova_free_inode_resource() in inode.c modifies pi->deleted and pi->valid without flushing pi
- nova_fallocate() in file.c modifies pi->i_flags on line 330 without flushing it
- nova_init_inode_table() in inode.c modifies of the inode table inode without flushing it
- check_eof_blocks() in inode.c modifies pi->i_flags on line 380 without flushing it
- nova_lite_transaction_for_time_and_link() in namei.c sets a couple pi fields around line 321 without flushing
- nova_update_inode_with_rebuild() in rebuild.c sets a bunch of persistent inode fields without flushing them
- nova_new_vfs_inode() in inode.c sets some persistent inode fields around line 1135 without flushing or fencing, although there's a comment indicating that they should be flushed.
Thanks!
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/NVSL/linux-nova/issues/94__;!!Mih3wA!XtUj5sGGwTeoG5u0TThnJ3Ld-91c6HIR8aZVBTR3ZsTxmuwxX1rVP3QelDruWec6Yk3N$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AAKBYEFC66GNAW3CUIETZXLS4MI55ANCNFSM4WZN7UMQ__;!!Mih3wA!XtUj5sGGwTeoG5u0TThnJ3Ld-91c6HIR8aZVBTR3ZsTxmuwxX1rVP3QelDruWQRwFgAe$ .
Hi Andiry,
Thanks for your response! I took a look at nova_rename()
and it looks like nova_create_rename_transaction()
is called after the call to nova_append_link_change_entry()
that appears buggy to me. Would this mean that the log update I'm finding to be buggy is not actually part of the rename transaction?
Yes, nova_append_link_change_entry is outside nova_create_rename_transaction. We first append log entries to each inode impacted, then use nova_create_rename_transaction to atomically update all the inodes. When crash before nova_create_rename_transaction(), the logs do not have the tail pointer updated and hence the new appended entries are ignored.
However, nova_rename() may create a new inode. Here we should set new_pi->valid = 0 to make sure it is invalid at the moment. Then after reboot, the rebuild process should ignore this inode, and the inode's log should not be traversed. I think nova_rebuild_inode() should check pi->valid field, but turns out it only checks pi->deleted. I can't recall the logic here as it has been several years...
Thanks, Andiry
On Tue, Feb 2, 2021 at 12:51 AM hayley-leblanc notifications@github.com wrote:
Hi Andiry,
Thanks for your response! I took a look at nova_rename() and it looks like nova_create_rename_transaction() is called after the call to nova_append_link_change_entry() that appears buggy to me. Would this mean that the log update I'm finding to be buggy is not actually part of the rename transaction?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/NVSL/linux-nova/issues/94*issuecomment-771473783__;Iw!!Mih3wA!VLoMsvKx4QvMkrusfMuLRsZYe5ZGjmjml5XCOcrE6m13vKIqygVjTm-BzAXnMwp7p3w2$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AAKBYEE46MSUVYYTQ7ADHITS464HHANCNFSM4WZN7UMQ__;!!Mih3wA!VLoMsvKx4QvMkrusfMuLRsZYe5ZGjmjml5XCOcrE6m13vKIqygVjTm-BzAXnM2_pHTzM$ .
I may have figured out the problem. My initial take on the issue was wrong - the null pointer deref is triggered when I add a crash during the call to nova_append_link_change_entry()
on the old inode, not a new one. Sorry for the confusion there. I also didn't describe exactly what I did to trigger this bug; I found it by starting from an empty file system and creating a directory, then creating a file in that directory, and trying to rename the file. I think this bug can only be triggered if the file being renamed has just been created and not modified/written to yet, because it involves a crash during log initialization. It seems to me right now that the recovery code works fine as long as the old inode has a log to append to in the transaction, but if we crash while the log is being set up it doesn't work right - does that seem possible?
What I'm seeing happen is, if we rename a brand-new file, it results in the inode's log initialized in nova_initialize_inode_log()
. If we crash during that function such that the inode's new log tail pointer has been persisted but the head tail pointer hasn't, we end up with one null log pointer and one non-null one. The recovery code in nova_traverse_file_inode_log()
expects that either both log pointer fields will be null, or they'll both be non-null (it checks the condition if (curr_p == 0 && pi->log_tail == 0)
where curr_p
is initially a pointer to pi->log_head
). So if it finds that they are not both null, it attempts to traverse the log, which is when the null pointer deref occurs, since one of the log pointers is null. nova_rebuild_file_inode_tree
has the same pattern which will also trigger a null pointer deref if the one in nova_traverse_file_inode_log()
is fixed.
I was able to prevent the null pointer dereference in recovery by changing the if (curr_p == 0 && pi->log_tail == 0)
condition in nova_traverse_file_inode_log()
(and the similar one in nova_rebuild_file_inode_tree()
to if (curr_p == 0 || pi->log_tail == 0)
but it looks like the recovery code still leaves the inode with mismatched log pointers, which probably would need to be fixed to prevent something else from going wrong later.
Thanks for the analysis. It does sound like a bug. We should skip the log traverse if either pointer is nullptr. Perhaps change the condition as you suggested, and print out a warning message and set both pointers to nullptr in this case?
Thanks, Andiry
On Wed, Feb 10, 2021 at 3:17 PM hayley-leblanc notifications@github.com wrote:
I may have figured out the problem. My initial take on the issue was wrong
- the null pointer deref is triggered when I add a crash during the call to nova_append_link_change_entry() on the old inode, not a new one. Sorry for the confusion there. I also didn't describe exactly what I did to trigger this bug; I found it by starting from an empty file system and creating a directory, then creating a file in that directory, and trying to rename the file. I think this bug can only be triggered if the file being renamed has just been created and not modified/written to yet, because it involves a crash during log initialization. It seems to me right now that the recovery code works fine as long as the old inode has a log to append to in the transaction, but if we crash while the log is being set up it doesn't work right - does that seem possible?
What I'm seeing happen is, if we rename a brand-new file, it results in the inode's log initialized in nova_initialize_inode_log(). If we crash during that function such that the inode's new log tail pointer has been persisted but the head tail pointer hasn't, we end up with one null log pointer and one non-null one. The recovery code in nova_traverse_file_inode_log() expects that either both log pointer fields will be null, or they'll both be non-null (it checks the condition if (curr_p == 0 && pi->log_tail == 0) where curr_p is initially a pointer to pi->log_head). So if it finds that they are not both null, it attempts to traverse the log, which is when the null pointer deref occurs, since one of the log pointers is null. nova_rebuild_file_inode_tree has the same pattern which will also trigger a null pointer deref if the one in nova_traverse_file_inode_log() is fixed.
I was able to prevent the null pointer dereference in recovery by changing the if (curr_p == 0 && pi->log_tail == 0) condition in nova_traverse_file_inode_log() (and the similar one in nova_rebuild_file_inode_tree() to if (curr_p == 0 || pi->log_tail == 0) but it looks like the recovery code still leaves the inode with mismatched log pointers, which probably would need to be fixed to prevent something else from going wrong later.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/NVSL/linux-nova/issues/94*issuecomment-777101119__;Iw!!Mih3wA!WDQb1iEhyUUwjHRXNjO2p1b68_zFVpL-TgH7h6VPgsO6j7FlF_ZHyQsNnzPwgKrlsCp6$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AAKBYEHMBBTLFV6OUNVEGPDS6MHXLANCNFSM4WZN7UMQ__;!!Mih3wA!WDQb1iEhyUUwjHRXNjO2p1b68_zFVpL-TgH7h6VPgsO6j7FlF_ZHyQsNnzPwgKLxiMm4$ .
Hi,
Two things in this issue - an instance of a bug I discovered and some potential locations of bugs.
I have a potential crash consistency bug in NOVA's implementation of
rename()
. I believe if the operation involves creating a new inode,nova_rename()
callsnova_append_link_change_entry()
; this callsnova_append_log_entry()
, which callsnova_get_append_head()
, which callsnova_extend_inode_log()
, which callsnova_initialize_inode_log()
.nova_initialize_inode_log()
contains the following code:I injected a crash between the first and third lines of this if statement such that the new value of
pi->log_tail
is persisted, but the value ofpi->log_head
is not. When I tried to mount the resulting file system image, the kernel crashed with a NULL pointer dereference. I believe the issue is ingoto_next_page
with the following lines:because
addr
is set to 0 when this bug is hit.I triggered this bug with the following workload:
mkdir(A), creat(A/foo), close(A/foo), rename(A/foo, A/bar)
with the crash injected in therename()
call. Can you confirm this bug? I don't currently have a patch but I'll try to make a PR if I get an opportunity to write one.I also have a couple questions about some places in NOVA that appear to be missing calls to
nova_flush_buffer()
even though they involve modifying persistent inode fields. I don't have instances of bugs related to any of these at the moment, but I'm curious whether these may be bugs or whether the lack ofnova_flush_buffer()
is intentional (or if I just missed subsequent flushes elsewhere in the code :) )nova_free_inode_resource()
in inode.c modifiespi->deleted
andpi->valid
without flushingpi
nova_fallocate()
in file.c modifiespi->i_flags
on line 330 without flushing itnova_init_inode_table()
in inode.c modifies of the inode table inode without flushing itcheck_eof_blocks()
in inode.c modifiespi->i_flags
on line 380 without flushing itnova_lite_transaction_for_time_and_link()
in namei.c sets a couplepi
fields around line 321 without flushingnova_update_inode_with_rebuild()
in rebuild.c sets a bunch of persistent inode fields without flushing themnova_new_vfs_inode()
in inode.c sets some persistent inode fields around line 1135 without flushing or fencing, although there's a comment indicating that they should be flushed.Thanks!