elastio / elastio-snap

kernel module for taking block-level snapshots and incremental backups of Linux block devices
GNU General Public License v2.0
21 stars 6 forks source link

Sometimes test_reload fails on test_reload_verified_inc #229

Closed skypodolsky closed 1 year ago

skypodolsky commented 1 year ago

It's sometimes seen that during elio-test.sh the driver fails on umount with the following kernel panic:

[   61.655725] BUG: unable to handle page fault for address: 0000000000001000
[   61.655753] #PF: supervisor read access in kernel mode
[   61.655765] #PF: error_code(0x0000) - not-present page
[   61.655778] PGD 0 P4D 0
[   61.655793] Oops: 0000 [#1] SMP NOPTI
[   61.655805] CPU: 3 PID: 2147 Comm: umount Tainted: G           OE     5.4.0-110-generic #124-Ubuntu
[   61.655826] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
[   61.655850] RIP: 0010:file_write_block.cold+0x29/0x122 [elastio_snap]
[   61.655866] Code: ff 48 8b 55 d0 4c 89 fe 48 c7 c7 f0 ad aa c0 e8 c4 27 fe f6 49 8b 4f 68 48 85 c9 74 67 83 3d 38 79 00 00 00 0f 84 cf b1 ff ff <48> 8b 31 48 c7 c7 b5 d0 aa c0 e8 9f 27 fe f6 49 8b 4f 68 e9 b7 b1
[   61.655903] RSP: 0018:ffff9def80b8bdd0 EFLAGS: 00010202
[   61.655919] RAX: 0000000000000053 RBX: 0000000000001000 RCX: 0000000000001000
[   61.655934] RDX: 0000000000000000 RSI: 0000000000000082 RDI: ffff91866fadc8c0
[   61.655949] RBP: ffff9def80b8be18 R08: 0000000000000d51 R09: 0000000000000004
[   61.655964] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
[   61.655980] R13: 0000000000000008 R14: 0000000000000000 R15: ffff918659091400
[   61.655997] FS:  00007f0c1815f840(0000) GS:ffff91866fac0000(0000) knlGS:0000000000000000
[   61.656015] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   61.656028] CR2: 0000000000001000 CR3: 000000042b528000 CR4: 0000000000340ee0
[   61.656045] Call Trace:
[   61.656065]  ? vprintk_func+0x4c/0xc0
[   61.656078]  __cow_sync_and_free_sections+0x7b/0xe0 [elastio_snap]
[   61.656093]  __tracer_destroy_cow+0xbf/0x1d0 [elastio_snap]
[   61.656107]  handle_bdev_mount_event+0x202/0x2b0 [elastio_snap]
[   61.656123]  umount_hook+0x93/0x110 [elastio_snap]
[   61.656140]  do_syscall_64+0x57/0x190
[   61.656161]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   61.656173] RIP: 0033:0x7f0c183be16b
[   61.656184] Code: cd 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 90 f3 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d f5 cc 0c 00 f7 d8 64 89 01 48
[   61.656220] RSP: 002b:00007ffd6f0b6b68 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[   61.656237] RAX: ffffffffffffffda RBX: 00007f0c184f0204 RCX: 00007f0c183be16b
[   61.656252] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000056010dc8ac40
[   61.656268] RBP: 000056010dc8aa30 R08: 0000000000000000 R09: 00007ffd6f0b5910
[   61.656292] R10: 00007f0c184dc379 R11: 0000000000000246 R12: 000056010dc8ac40
[   61.656308] R13: 0000000000000000 R14: 000056010dc8ab28 R15: 0000000000000000
[   61.656324] Modules linked in: elastio_snap(OE) xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bpfilter br_netfilter bridge stp llc aufs overlay binfmt_misc dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua joydev input_leds kvm_amd ccp mac_hid serio_raw kvm qemu_fw_cfg sch_fq_codel msr ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul ghash_clmulni_intel virtio_net net_failover failover cirrus drm_kms_helper aesni_intel syscopyarea sysfillrect crypto_simd sysimgblt fb_sys_fops drm cryptd glue_helper psmouse virtio_blk pata_acpi i2c_piix4 floppy [last unloaded: elastio_snap]
[   61.656493] CR2: 0000000000001000
[   61.656506] ---[ end trace d1df488f6a4d14fe ]---
[   61.657073] RIP: 0010:file_write_block.cold+0x29/0x122 [elastio_snap]
[   61.657629] Code: ff 48 8b 55 d0 4c 89 fe 48 c7 c7 f0 ad aa c0 e8 c4 27 fe f6 49 8b 4f 68 48 85 c9 74 67 83 3d 38 79 00 00 00 0f 84 cf b1 ff ff <48> 8b 31 48 c7 c7 b5 d0 aa c0 e8 9f 27 fe f6 49 8b 4f 68 e9 b7 b1
[   61.658772] RSP: 0018:ffff9def80b8bdd0 EFLAGS: 00010202
[   61.659335] RAX: 0000000000000053 RBX: 0000000000001000 RCX: 0000000000001000
[   61.659906] RDX: 0000000000000000 RSI: 0000000000000082 RDI: ffff91866fadc8c0
[   61.660497] RBP: ffff9def80b8be18 R08: 0000000000000d51 R09: 0000000000000004
[   61.660984] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
[   61.661272] R13: 0000000000000008 R14: 0000000000000000 R15: ffff918659091400
[   61.661556] FS:  00007f0c1815f840(0000) GS:ffff91866fac0000(0000) knlGS:0000000000000000
[   61.661840] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   61.662119] CR2: 0000000000001000 CR3: 000000042b528000 CR4: 0000000000340ee0
[   61.668460] elastio-snap: detected block device umount: /tmp/elastio-snap_010 : 0
[   61.669177] elastio-snap: block device umount detected for device 10

Despite this happens on ext4 and xfs (other filesystems were not verified), xfs seems to facilitate the reproduce. Probably, the sd_cow pointer is corrupted when switching to the incremental mode.

Relates to the epic #219