Open crawfxrd opened 5 years ago
IO on a snapshotted volume will cause a NULL pointer deference in BFQ.
BFQ requires blk-mq Enable blk-mq by appending the following to the kernel command line: scsi_mod.use_blk_mq=1
blk-mq
scsi_mod.use_blk_mq=1
Steps to reproduce:
modprobe bfq echo bfq > /sys/block/sda/queue/scheduler dbdctl setup-snapshot /dev/sda2 /boot/cow.snap 1 sync
Similar to #6, this only happens for devices directly using the scheduler, not for stacked devices using a different scheduler (i.e., dm volume using none).
none
Trace:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000198 PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 0 PID: 28054 Comm: datto_snap_mrf1 Kdump: loaded Tainted: G OE 4.19.13-200.fc28.x86_64 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS ?-20180531_142017-buildhw-08.phx2.fedoraproject.org-1.fc28 04/01/2014 RIP: 0010:bfq_setup_cooperator+0x29/0x280 [bfq] Code: ff 0f 1f 44 00 00 41 57 41 56 41 55 41 54 49 89 fc 55 53 48 89 f3 48 83 ec 18 65 48 8b 04 25 28 00 00 00 48 89 44 24 10 31 c0 <48> 83 be 98 01 00 00 00 74 1c 48 8b 35 66 e4 a1 d2 b8 64 00 00 00 RSP: 0018:ffffa38300993d28 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001 RDX: ffff8e6fe6b47500 RSI: 0000000000000000 RDI: ffff8e6fe6992000 RBP: ffff8e6fe6992000 R08: ffffa38300993e00 R09: 0000000000000000 R10: 0000000000001000 R11: 000000336ed5f516 R12: ffff8e6fe6992000 R13: ffff8e6fb55b43d0 R14: ffff8e6fe6992368 R15: ffffa38300993e00 FS: 0000000000000000(0000) GS:ffff8e6ffda00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000198 CR3: 000000004420a005 CR4: 00000000001606f0 Call Trace: bfq_insert_requests+0x11a/0xf40 [bfq] ? blk_mq_get_tag+0x236/0x260 blk_mq_sched_insert_request+0x142/0x1d0 blk_mq_make_request+0x1e5/0x520 ? __cow_write_header+0xd0/0xd0 [dattobd] snap_mrf_thread+0xc5/0x180 [dattobd] ? finish_wait+0x80/0x80 kthread+0x112/0x130 ? kthread_create_worker_on_cpu+0x70/0x70 ret_from_fork+0x35/0x40 Modules linked in: dattobd(OE) bfq ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul snd_pcsp ghash_clmulni_intel snd_pcm iTCO_wdt iTCO_vendor_support snd_timer snd joydev soundcore i2c_i801 lpc_ich virtio_balloon xfs libcrc32c qxl drm_kms_helper ttm crc32c_intel drm serio_raw virtio_net qemu_fw_cfg virtio_console net_failover failover virtio_scsi CR2: 0000000000000198 ---[ end trace 564fda7711752e62 ]--- RIP: 0010:bfq_setup_cooperator+0x29/0x280 [bfq] Code: ff 0f 1f 44 00 00 41 57 41 56 41 55 41 54 49 89 fc 55 53 48 89 f3 48 83 ec 18 65 48 8b 04 25 28 00 00 00 48 89 44 24 10 31 c0 <48> 83 be 98 01 00 00 00 74 1c 48 8b 35 66 e4 a1 d2 b8 64 00 00 00 RSP: 0018:ffffa38300993d28 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001 RDX: ffff8e6fe6b47500 RSI: 0000000000000000 RDI: ffff8e6fe6992000 RBP: ffff8e6fe6992000 R08: ffffa38300993e00 R09: 0000000000000000 R10: 0000000000001000 R11: 000000336ed5f516 R12: ffff8e6fe6992000 R13: ffff8e6fb55b43d0 R14: ffff8e6fe6992368 R15: ffffa38300993e00 FS: 0000000000000000(0000) GS:ffff8e6ffda00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000198 CR3: 000000004420a005 CR4: 00000000001606f0
May be an upstream issue. Found this report: https://groups.google.com/forum/#!topic/bfq-iosched/-qkFHaw6Ccs
Per Paolo's message, this should be retested on a 4.20 kernel.
Still present as of 5.1.0-1.fc31.x86_64.
IO on a snapshotted volume will cause a NULL pointer deference in BFQ.
BFQ requires
blk-mq
Enableblk-mq
by appending the following to the kernel command line:scsi_mod.use_blk_mq=1
Steps to reproduce:
Similar to #6, this only happens for devices directly using the scheduler, not for stacked devices using a different scheduler (i.e., dm volume using
none
).Trace: