NVSL / linux-nova

NOVA is a log-structured file system designed for byte-addressable non-volatile memories, developed at the University of California, San Diego.
http://nvsl.ucsd.edu/index.php?path=projects/nova
Other
421 stars 118 forks source link

Possible bug in failure recovery code #88

Closed hayley-leblanc closed 3 years ago

hayley-leblanc commented 3 years ago

Hi NOVA folks,

Are there any restrictions on the number of cores that must be available on a machine/VM to run NOVA? I believe I may have found a bug in the failure recovery code when there is only 1 core. I am running NOVA with Linux 5.1 and Ubuntu 20.04 on a KVM-Qemu VM with 1 core and a 128MB emulated persistent memory device. I am working on a tool to test the crash consistency of persistent memory file systems, which is what found the potential bug.

The workload that triggers it has the following format:

  1. Mount an existing, but otherwise empty NOVA file system on /mnt/pmem (something like mount -o init -t NOVA /dev/pmem0 /mnt/pmem; umount /dev/pmem0; mount -t NOVA /dev/pmem0 /mnt/pmem; I suspect the bug still occurs without this part but the crash testing tool currently requires a pre-made base file system)
  2. Create file /mnt/pmem/foo
  3. Fsync /mnt/pmem/foo
  4. Crash
  5. Mount the crashed file system

When I attempt to do step 5, I get the following NULL pointer dereference error:

[   60.822938] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[   60.823790] #PF error: [normal kernel read fault]
[   60.824275] PGD 0 P4D 0 
[   60.824533] Oops: 0000 [#1] SMP PTI
[   60.824884] CPU: 0 PID: 1327 Comm: test_harness Tainted: G           OE     5.1.0+ #176
[   60.825694] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014
[   60.826585] RIP: 0010:nova_traverse_inode_log.isra.8+0x24/0xc0
[   60.827162] Code: af e8 b0 5e c9 ff 0f 1f 44 00 00 48 85 d2 0f 84 9f 00 00 00 55 48 89 d1 48 89 e5 81 e1 ff 0f 00 00 0f 85 8f 00 00 00 49 89 d0 <48> 8b 46 08 49 c1 e8 0c 3e 4c 0f ab 00 48 8b 87 d8 03 00 00 41
[   60.829282] RSP: 0018:ffff9d7781807a48 EFLAGS: 00010246
[   60.829896] RAX: 0000000000004000 RBX: ffff9d7781807a98 RCX: 0000000000000000
[   60.830720] RDX: 000000000117a000 RSI: 0000000000000000 RDI: ffff90fad3c98800
[   60.831428] RBP: ffff9d7781807a48 R08: 000000000117a000 R09: ffff90faf855de18
[   60.832127] R10: 0000000000000550 R11: 0000000000000001 R12: ffff9d7781807b10
[   60.832829] R13: ffff90fad3c98800 R14: 0000000000000000 R15: ffff90fad3ddc000
[   60.833529] FS:  00007f6838d8e740(0000) GS:ffff90fafbe00000(0000) knlGS:0000000000000000
[   60.834459] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   60.835036] CR2: 0000000000000008 CR3: 00000001273be000 CR4: 00000000000006b0
[   60.835746] Call Trace:
[   60.836008]  nova_recover_inode_pages+0x8e/0xb0
[   60.836496]  nova_failure_recovery_crawl+0x3c6/0x3f0
[   60.836990]  nova_failure_recovery+0x2e4/0x460
[   60.837433]  ? _cond_resched+0x1d/0x30
[   60.837809]  ? __kmalloc+0x32/0x220
[   60.838160]  nova_recovery.cold.30+0x21b/0x669
[   60.838603]  ? _cond_resched+0x1d/0x30
[   60.838980]  nova_fill_super.cold.21+0x78c/0x9c7
[   60.839439]  mount_bdev+0x191/0x1c0
[   60.839790]  ? nova_update_super_crc+0x80/0x80
[   60.840233]  nova_mount+0x19/0x20
[   60.840600]  legacy_get_tree+0x2c/0x50
[   60.840975]  vfs_get_tree+0x2e/0xe0
[   60.841324]  do_mount+0x7bd/0xd80
[   60.841655]  ? __check_object_size+0x16a/0x196
[   60.842168]  ? memdup_user+0x53/0x80
[   60.842662]  ksys_mount+0xc2/0xe0
[   60.843089]  __x64_sys_mount+0x29/0x30
[   60.843513]  do_syscall_64+0x57/0x100
[   60.843878]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   60.844380] RIP: 0033:0x7f6839004c4e
[   60.844737] Code: 48 8b 0d 45 82 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 12 82 0c 00 f7 d8 64 89 08
[   60.846626] RSP: 002b:00007ffd12689868 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[   60.847368] RAX: ffffffffffffffda RBX: 000056307803b1e0 RCX: 00007f6839004c4e
[   60.848095] RDX: 0000563076e939c8 RSI: 000056307803b1e0 RDI: 00007ffd12689bc0
[   60.848818] RBP: 00007ffd12689990 R08: 0000000000000000 R09: 0000000000000000
[   60.849690] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000100000000
[   60.850411] R13: 00007ffd12689fd0 R14: 0000000000000000 R15: 0000000000000000
[   60.851124] Modules linked in: 9p fscache nls_iso8859_1 bochs_drm ttm drm_kms_helper psmouse e1000 9pnet_virtio fb_sys_fops syscopyarea 9pnet serio_raw sysfillrect dax_pmem_compat i2c_piix4 device_dax ...
[   60.854359] CR2: 0000000000000008
[   60.854808] ---[ end trace 8ae72eb59522dd68 ]---
[   60.855271] RIP: 0010:nova_traverse_inode_log.isra.8+0x24/0xc0
[   60.855877] Code: af e8 b0 5e c9 ff 0f 1f 44 00 00 48 85 d2 0f 84 9f 00 00 00 55 48 89 d1 48 89 e5 81 e1 ff 0f 00 00 0f 85 8f 00 00 00 49 89 d0 <48> 8b 46 08 49 c1 e8 0c 3e 4c 0f ab 00 48 8b 87 d8 03 00 00 41
[   60.857760] RSP: 0018:ffff9d7781807a48 EFLAGS: 00010246
[   60.858325] RAX: 0000000000004000 RBX: ffff9d7781807a98 RCX: 0000000000000000
[   60.859101] RDX: 000000000117a000 RSI: 0000000000000000 RDI: ffff90fad3c98800
[   60.859803] RBP: ffff9d7781807a48 R08: 000000000117a000 R09: ffff90faf855de18
[   60.860513] R10: 0000000000000550 R11: 0000000000000001 R12: ffff9d7781807b10
[   60.861215] R13: ffff90fad3c98800 R14: 0000000000000000 R15: ffff90fad3ddc000
[   60.861921] FS:  00007f6838d8e740(0000) GS:ffff90fafbe00000(0000) knlGS:0000000000000000
[   60.862743] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   60.863311] CR2: 0000000000000008 CR3: 00000001273be000 CR4: 00000000000006b0

I traced the error to the nova_failure_recovery_crawl() function in bbuild.c. The line nova_recover_inode_pages(sb, &sih, &task_rings[0], &fake_pi, global_bm[1]); seems to assume that there is at least one CPU; when I change global_bm[1] to global_bm[0] I don't encounter the NULL pointer dereference error on my VM.

Thanks!

Andiry commented 3 years ago

Thanks for the finding. Indeed, in alloc_bm(), the global_bm is allocated based on the number of CPUs. I can't recall why I hard-coded to use global_bm[1] here. Seems there is no reason for that. I guess setting to zero is a fix, as using which bm does not really matter. Care to send a PR?

hayley-leblanc commented 3 years ago

Will do! Thanks!