NVSL / linux-nova

NOVA is a log-structured file system designed for byte-addressable non-volatile memories, developed at the University of California, San Diego.
http://nvsl.ucsd.edu/index.php?path=projects/nova
Other
421 stars 117 forks source link

Infinite assertion failures and/or hang with metadata_csum=1 and replica_metadata=0 #30

Closed stevenjswanson closed 7 years ago

stevenjswanson commented 7 years ago

With these configuration options, I get a whole system hang. Sometimes w/ and sometimes w/o a bunch of assertion failures on dmesg.

on 84d3e6afa6effd389a0b8a0129dbf22c30c48d5c, but it it's been present since at least 10142f3f22403b031b2adde4afd7bd2a85d55086.

This happens with

measure_timing=0
inplace_data_updates=0
wprotect=0 
mmap_cow=1      
unsafe_metadata=0     
replica_metadata=0 
metadata_csum=1 
dram_struct_csum=1      
data_csum=1 
data_parity=1

but not

measure_timing=0
inplace_data_updates=0
wprotect=0
mmap_cow=1
unsafe_metadata=0
replica_metadata=0 
metadata_csum=0 ***
dram_struct_csum=1 
data_csum=1 
data_parity=1
# sudo umount /mnt/ramdisk; sudo rmmod nova; sudo modprobe nova measure_timing=0      inplace_data_updates=0      wprotect=0 mmap_cow=1      unsafe_metadata=0     replica_metadata=0 metadata_csum=1 dram_struct_csum=1      data_csum=1 data_parity=1; sudo mount -t NOVA -o init /dev/pmem0 /mnt/ramdisk; echo 1 | sudo tee  /proc/fs/NOVA/pmem0/create_snapshot ;cat /proc/fs/NOVA/pmem0/snapshots
<hangs->
# dmesg
[ 4064.180232] nova: nova_rebuild_dir_inode_tree: unknown type 195, 0x3eee000
[ 4064.181171] nova: nova_rebuild_dir_inode_tree: unknown type 195, 0x3eee000
[ 4064.181208] nova: nova_rebuild_dir_inode_tree: unknown type 195, 0x3eee000
[ 4064.181229] assertion failed fs/nova/rebuild.c:673: 0
[ 4064.181348] nova: nova_rebuild_dir_inode_tree: unknown type 195, 0x3eee000
[ 4064.181386] nova: nova_rebuild_dir_inode_tree: unknown type 195, 0x3eee000
[ 4064.181442] nova: nova_rebuild_dir_inode_tree: unknown type 195, 0x3eee000
[ 4064.181457] nova: nova_rebuild_dir_inode_tree: unknown type 195, 0x3eee000
[ 4064.181508] assertion failed fs/nova/rebuild.c:673: 0
[ 4064.181541] assertion failed fs/nova/rebuild.c:673: 0
[ 4064.181556] nova: nova_rebuild_dir_inode_tree: unknown type 195, 0x3eee000
<...forever...>
luzh commented 7 years ago

metadata_csum shouldn't be used without replica_metadata, as we rely on the tick-tock scheme to maintain consistency, since at least two fields are updated in some metadata structure: something + checksum. So some code may not handle this configuration well. I think we can fix it at mount time by setting replica_metadata = 1 when metadata_csum = 1.

Andiry commented 7 years ago

That's true. I think we should just combine this two flags.

On Fri, Jul 7, 2017 at 5:03 PM, Lu Zhang notifications@github.com wrote:

metadata_csum shouldn't be used without replica_metadata, as we rely on the tick-tock scheme to maintain consistency, since at least two fields are updated in some metadata structure: something + checksum. So some code may not handle this configuration well. I think we can fix it at mount time by setting replica_metadata = 1 when metadata_csum = 1.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/NVSL/linux-nova/issues/30#issuecomment-313819185, or mute the thread https://github.com/notifications/unsubscribe-auth/ABQcEM4YJ1MtQ1al9SmS-hczdG-vfHgGks5sLsczgaJpZM4ON8Tz .

stevenjswanson commented 7 years ago

I agree.

-steve

-- Composed on (and maybe dictated to) my phone.

On Jul 7, 2017, at 17:29, Andiry Xu notifications@github.com wrote:

That's true. I think we should just combine this two flags.

On Fri, Jul 7, 2017 at 5:03 PM, Lu Zhang notifications@github.com wrote:

metadata_csum shouldn't be used without replica_metadata, as we rely on the tick-tock scheme to maintain consistency, since at least two fields are updated in some metadata structure: something + checksum. So some code may not handle this configuration well. I think we can fix it at mount time by setting replica_metadata = 1 when metadata_csum = 1.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/NVSL/linux-nova/issues/30#issuecomment-313819185, or mute the thread < https://github.com/notifications/unsubscribe-auth/ABQcEM4YJ1MtQ1al9SmS-hczdG-vfHgGks5sLsczgaJpZM4ON8Tz

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/NVSL/linux-nova/issues/30#issuecomment-313821370, or mute the thread https://github.com/notifications/unsubscribe-auth/AIpg3cS3K8ezrcCprA7DPvNd6fuik-blks5sLs1QgaJpZM4ON8Tz .

luzh commented 7 years ago

A quick fix just to make them equal: https://github.com/NVSL/linux-nova/commit/a5f89a2f3bd68c4adad30e53ad695995ca99599e

stevenjswanson commented 7 years ago

If we are going to fix, let’s fix it right and just have on flag. This just makes the code messier and its behavior harder to understand. -steve

On Jul 7, 2017, at 10:45 PM, Lu Zhang notifications@github.com wrote:

A quick fix just to make them equal: a5f89a2 https://github.com/NVSL/linux-nova/commit/a5f89a2f3bd68c4adad30e53ad695995ca99599e — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/NVSL/linux-nova/issues/30#issuecomment-313835925, or mute the thread https://github.com/notifications/unsubscribe-auth/AIpg3XJt2YVoiPDps3OsIkAY9y3pQ72wks5sLxdqgaJpZM4ON8Tz.

luzh commented 7 years ago

Moved to https://github.com/NVSL/linux-nova/issues/35