NVSL / linux-nova

NOVA is a log-structured file system designed for byte-addressable non-volatile memories, developed at the University of California, San Diego.
http://nvsl.ucsd.edu/index.php?path=projects/nova
Other
421 stars 118 forks source link

Some bugs in metadata_csum mode #100

Closed hayley-leblanc closed 2 years ago

hayley-leblanc commented 3 years ago

Hi,

I think I've found a couple of bugs in NOVA's metadata_csum mode.

The first bug appears to be an issue that seems to occur while mounting a cleanly-unmounted instance of NOVA. I've been able to trigger it by initializing a new instance of NOVA, creating a directory, and then unmounting it. On remount, one of my collaborators has reported a GPF crash. I don't get a crash on my machine, but on a kernel compiled with KASAN, I get the following KASAN report and partial call trace (which match the trace my collaborator sees on the GPF):

[   70.122633] ==================================================================
[   70.123476] BUG: KASAN: wild-memory-access in memcpy_to_pmem_nocache+0x27/0x41 [nova]
[   70.124244] Write of size 120 at addr ffff1101b88f78a8 by task mount/1350

[   70.125064] CPU: 0 PID: 1350 Comm: mount Tainted: G           OE     5.1.0+ #415
[   70.125066] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014
[   70.125067] Call Trace:
[   70.125081]  dump_stack+0x94/0xd8
[   70.125093]  ? memcpy_to_pmem_nocache+0x27/0x41 [nova]
[   70.125103]  ? memcpy_to_pmem_nocache+0x27/0x41 [nova]
[   70.125111]  kasan_report+0x171/0x18c
[   70.125122]  ? memcpy_to_pmem_nocache+0x27/0x41 [nova]
[   70.125126]  check_memory_region+0x137/0x190
[   70.125129]  kasan_check_write+0x14/0x20
[   70.125139]  memcpy_to_pmem_nocache+0x27/0x41 [nova]
[   70.125150]  nova_free_inode_log+0x2b7/0x506 [nova]
[   70.125162]  ? nova_free_contiguous_log_blocks+0x213/0x213 [nova]
[   70.125171]  ? nova_insert_range_node+0x187/0x198 [nova]
[   70.125182]  nova_init_blockmap_from_inode+0x1ad/0x5ce [nova]

I did a little digging and it appears to be triggered by a strange value (0xffff1101b88f78a8 in this particular instance) in sih->alter_pi_addr being accessed in nova_free_inode_log.

The second bug is a crash consistency bug that occurs when a crash happens while a file's size is being increased by a write or truncate operation, or when a file is being renamed, and nova_initialize_inode_log gets called by nova_extend_inode_log. It seems as though there is a period of time between the initialization of the main log pointers and the alter log pointers in which a crash can cause problems. Specifically, if the system crashes after updating the main log pointers and checksum for the file's inode, but before it finishes updating the alter log pointers and calculating the new checksum, the file in question can't be deleted after the crash system is mounted again. Attempting to delete it gives an ENOSPC error. I don't know exactly what's going wrong during the unlink call that causes it to fail. I have observed that in crash states where the unlink fails, the recovery procedure prints "nova: nova_check_inode_integrity: inode replica 33 is stale, trying to repair using the primary" in dmesg. When it succeeds, I still see a "nova_repair_inode: inode 33 error repaired" message, but it seems to be coming from a checksum error rather than a stale replica.

I don't currently have a fix ready for either bug, although I have observed that the crash consistency bug seems to go away if the call to nova_update_inode_checksum is moved out of nova_initialize_inode_log and only called after we initialize both the primary and alter logs.

Let me know what you think! Thanks.

Andiry commented 3 years ago

Are you running on VM or bare-metal machine? I think for NOVA running on VM you have to disable some kernel build flags to avoid some weird memory access problems. On bare-metal it should work.

Not sure about the other bug though. Is there a way to reproduce it constantly?

(Sorry for late replies. I have graduated and thus do not have much time working on NOVA.)

hayley-leblanc commented 3 years ago

We are using VMs, so that could be the cause of the memory issue. Do you know what the correct kernel configuration is?

We are working on a tool to test PM file systems, which is able to consistently reproduce the bug, but the tool isn't open source yet. I'll see if I can figure out exactly what's causing the issue within the tool and try to provide some more info.

Also, no worries at all! Thanks for continuing to maintain NOVA :)

Andiry commented 3 years ago

Hi Jian,

Do you have suggestions for kernel configuration for NOVA running on VM? Thanks.

Andiry

On Mon, Apr 26, 2021 at 8:09 AM hayley-leblanc @.***> wrote:

We are using VMs, so that could be the cause of the memory issue. Do you know what the correct kernel configuration is?

We are working on a tool to test PM file systems, which is able to consistently reproduce the bug, but the tool isn't open source yet. I'll see if I can figure out exactly what's causing the issue within the tool and try to provide some more info.

Also, no worries at all! Thanks for continuing to maintain NOVA :)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/NVSL/linux-nova/issues/100*issuecomment-826912512__;Iw!!Mih3wA!V0yCbm1kuiQxCIoLAWOGbI0PI8iaZOImqAwJtztdfvk5FapHVeOCxMzBt5Tv7uo9QC18$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AAKBYEFQJUMTG5XYCPZX6Z3TKV6YBANCNFSM43IQ23CQ__;!!Mih3wA!V0yCbm1kuiQxCIoLAWOGbI0PI8iaZOImqAwJtztdfvk5FapHVeOCxMzBt5Tv7qrlDeAd$ .

sheepx86 commented 3 years ago

I ran NOVA on VMs a couple of years ago. The hypervisor was VMWare ESXi 6.5. If I remember correctly, I disabled KPTI (and KASLR), but I'm not sure if it could run with KPTI on.

Regarding running NOVA in a VM, you could also try disabling any para-virtualized memory ballooning drivers (E.g., vmtools, virtio-balloon), and try disabling the memory overcommitment or dedup (e.g., KERNEL_KSM).

hayley-leblanc commented 3 years ago

Awesome, thank you. We'll try those configuration changes and see if it resolves the issue. I'm currently working on pinning down the cause of the potential crash consistency bug I mentioned in this issue and I'll try to post some more details soon.

hayley-leblanc commented 3 years ago

We were able to avoid the GPF/KASAN issue by disabling KPTI - thanks for the help there! Is there a reason why NOVA has issues running on VMs in general? Can it handle KPTI being on when run on the host?

Also, I have some more details on the potential crash consistency issue. I believe the issue arises when, in the primary inode, the main log pointers are present after a crash, but the alternate log pointers are null. At the end of unlink, NOVA appends an entry to the unlinked inode's logs using nova_append_link_change_entry, initializing the logs if they don't already exist. NOVA appears to only use the inode's main log to determine whether it needs to initialize the logs. When the main log pointers are null, NOVA initializes both the main and alternate logs and everything works fine. However, if the main log was initialized pre-crash, NOVA seems to assume that the alternate log is always initialized too and attempts to append to it. This fails (and causes the unlink call to fail too) if the alternate log was not actually initialized pre-crash.

We found that such a crash state, where an inode's main log is initialized but the alternate log is not, can arise if the system crashes between calling nova_initialize_inode_log on the main log and the alternate log, as is done in nova_extend_inode_log. I believe the fix I proposed in a previous comment (updating the inode checksum after both log initializations) works because it leads to the replica inode being used to repair the primary inode in the problematic crash states, which restores the primary inode to a state where it doesn't have mismatched main and alternate logs. I think it could also be fixed by checking for this specific situation and initializing the alternate log if it arises, either in the recovery code or in log append code.

Andiry commented 3 years ago

Thanks for the analysis. Do you have a CL for this or is it already committed? Anyway, PR are welcome.

hayley-leblanc commented 2 years ago

Closing because the crash consistency bug has been fixed and the GPF will be addressed per #115.