Closed lucab closed 1 year ago
Same kernel also hit another GPF the day before, and it seems to be coming from the same BFQ logic:
[ 71.653536] general protection fault, probably for non-canonical address 0x1010006000001bd: 0000 [#1] SMP PTI
[ 71.663618] CPU: 0 PID: 253 Comm: kworker/u2:3 Not tainted 5.14.16-301.fc35.x86_64 #1
[ 71.671643] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[ 71.681067] Workqueue: writeback wb_workfn (flush-8:0)
[ 71.686354] RIP: 0010:__bfq_deactivate_entity+0x15c/0x240
[ 71.691950] Code: 48 2b 41 28 48 85 c0 7e 05 49 89 5c 24 18 49 8b 44 24 08 4d 8d 74 24 08 48 85 c0 0f 84 d6 00 00 00 48 8b 7b 28 eb 03 48 89 c8 <48> 8b 48 28 48 8d 70 10 48 8d 50 08 48 29 f9 48 85 c9 48 0f 4f d6
[ 71.711025] RSP: 0018:ffffb656c015b538 EFLAGS: 00010006
[ 71.716374] RAX: 0101000600000195 RBX: ffff9bea12d6e098 RCX: 0101000600000195
[ 71.723628] RDX: ffff9bea0b4453c8 RSI: ffff9bea0b4453d0 RDI: 00000009f8ea36ed
[ 71.730961] RBP: 0000000000000000 R08: ffff9bea1211b4e0 R09: ffff9bea1211b4e0
[ 71.738217] R10: 0000000000000026 R11: 0000000000000000 R12: ffff9bea0b70f958
[ 71.745470] R13: 0000000000000001 R14: ffff9bea0b70f960 R15: ffff9bea12d6e098
[ 71.752715] FS: 0000000000000000(0000) GS:ffff9bea2c000000(0000) knlGS:0000000000000000
[ 71.760916] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 71.766771] CR2: 000056063b242000 CR3: 0000000112f2a004 CR4: 00000000001706f0
[ 71.774032] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 71.781281] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 71.788521] Call Trace:
[ 71.791091] bfq_deactivate_entity+0x4f/0xc0
[ 71.795484] bfq_del_bfqq_busy+0xc9/0x180
[ 71.799604] __bfq_bfqq_expire+0x95/0xc0
[ 71.803644] bfq_bfqq_expire+0x3bd/0x9a0
[ 71.807689] ? bfq_may_expire_for_budg_timeout+0x7f/0x1b0
[ 71.813213] ? bfq_active_extract+0x8e/0x140
[ 71.817592] ? bfq_bfqq_served+0xb0/0x1c0
[ 71.821707] bfq_dispatch_request+0x3fd/0x1220
[ 71.826268] ? sbitmap_get+0x86/0x190
[ 71.830052] __blk_mq_do_dispatch_sched+0x1d1/0x2e0
[ 71.835047] __blk_mq_sched_dispatch_requests+0xd8/0x130
[ 71.840492] blk_mq_sched_dispatch_requests+0x30/0x60
[ 71.845654] __blk_mq_run_hw_queue+0x2d/0x60
[ 71.850031] __blk_mq_delay_run_hw_queue+0x144/0x150
[ 71.855104] blk_mq_sched_insert_requests+0x63/0xe0
[ 71.860104] blk_mq_flush_plug_list+0xed/0x170
[ 71.864827] blk_mq_submit_bio+0x280/0x580
[ 71.869030] submit_bio_noacct+0x410/0x4e0
[ 71.873246] ? unlock_page_memcg+0x18/0x70
[ 71.877453] iomap_submit_ioend+0x4e/0x80
[ 71.881574] iomap_do_writepage+0x455/0x730
[ 71.885927] write_cache_pages+0x173/0x3b0
[ 71.890148] ? iomap_write_begin+0x460/0x460
[ 71.894557] iomap_writepages+0x1c/0x40
[ 71.898503] xfs_vm_writepages+0x6e/0x90 [xfs]
[ 71.903261] do_writepages+0x31/0xb0
[ 71.906945] ? __wb_calc_thresh+0x2a/0x100
[ 71.911245] ? wb_calc_thresh+0x41/0x50
[ 71.915205] __writeback_single_inode+0x39/0x280
[ 71.919957] writeback_sb_inodes+0x1d8/0x440
[ 71.924335] __writeback_inodes_wb+0x4c/0xe0
[ 71.928715] wb_writeback+0x1da/0x280
[ 71.932495] wb_workfn+0x2b4/0x4a0
[ 71.936025] ? check_preempt_curr+0x55/0x70
[ 71.940334] ? ttwu_do_wakeup+0x17/0x150
[ 71.944370] process_one_work+0x1ec/0x390
[ 71.948501] worker_thread+0x53/0x3e0
[ 71.952274] ? process_one_work+0x390/0x390
[ 71.956652] kthread+0x127/0x150
[ 71.959999] ? set_kthread_struct+0x40/0x40
[ 71.964293] ret_from_fork+0x22/0x30
[ 71.967984] Modules linked in: xt_conntrack iptable_filter xt_MASQUERADE xt_comment iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth bridge stp llc overlay intel_rapl_msr intel_rapl_common kvm_intel kvm irqbypass snd_pcsp snd_pcm rapl snd_timer snd soundcore i2c_piix4 pvpanic_mmio pvpanic drm ip_tables xfs rfkill dm_multipath crct10dif_pclmul crc32_pclmul crc32c_intel virtio_net ghash_clmulni_intel net_failover failover serio_raw virtio_scsi ipmi_devintf ipmi_msghandler fuse
[ 72.011990] ---[ end trace 34d26a14750edabd ]---
[ 72.016730] RIP: 0010:__bfq_deactivate_entity+0x15c/0x240
[ 72.022354] Code: 48 2b 41 28 48 85 c0 7e 05 49 89 5c 24 18 49 8b 44 24 08 4d 8d 74 24 08 48 85 c0 0f 84 d6 00 00 00 48 8b 7b 28 eb 03 48 89 c8 <48> 8b 48 28 48 8d 70 10 48 8d 50 08 48 29 f9 48 85 c9 48 0f 4f d6
[ 72.041234] RSP: 0018:ffffb656c015b538 EFLAGS: 00010006
[ 72.046878] RAX: 0101000600000195 RBX: ffff9bea12d6e098 RCX: 0101000600000195
[ 72.054124] RDX: ffff9bea0b4453c8 RSI: ffff9bea0b4453d0 RDI: 00000009f8ea36ed
[ 72.061373] RBP: 0000000000000000 R08: ffff9bea1211b4e0 R09: ffff9bea1211b4e0
[ 72.068629] R10: 0000000000000026 R11: 0000000000000000 R12: ffff9bea0b70f958
[ 72.075883] R13: 0000000000000001 R14: ffff9bea0b70f960 R15: ffff9bea12d6e098
[ 72.083248] FS: 0000000000000000(0000) GS:ffff9bea2c000000(0000) knlGS:0000000000000000
[ 72.091462] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 72.097335] CR2: 000056063b242000 CR3: 0000000112f2a004 CR4: 00000000001706f0
[ 72.104682] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 72.113512] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Forwarded to the Fedora bugzilla, https://bugzilla.redhat.com/show_bug.cgi?id=2022819.
Mindlessly searching for the GPF with bfq_deactivate_entity
hit on https://lkml.org/lkml/2020/10/9/1120
openQA found that as well on openSUSE Tumbleweed, both in 5.14.14 and 5.15.2: https://bugzilla.opensuse.org/show_bug.cgi?id=1192714
In most of the failure cases there's no GPF or NULL deref message, the system just freezes completely (cursor stops blinking).
I was able to get a KASAN use-after-free report (also in the oS bug report):
```
[ 235.949241] ==================================================================
[ 235.950326] BUG: KASAN: use-after-free in __bfq_deactivate_entity+0x9cb/0xa50
[ 235.951369] Read of size 8 at addr ffff88800693c0c0 by task runc:[2:INIT]/10544
[ 235.953476] CPU: 0 PID: 10544 Comm: runc:[2:INIT] Tainted: G E 5.15.2-0.g5fb85fd-default #1 openSUSE Tumbleweed (unreleased) f1f3b891c72369aebecd2e43e4641a6358867c70
[ 235.955726] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014
[ 235.958007] Call Trace:
[ 235.959157]
FYI, the result of the investigation so far is https://lore.kernel.org/all/20211125172809.GC19572@quack2.suse.cz/
Thanks to the openSUSE folks, there is an initial patch going through review at https://patchwork.kernel.org/project/linux-block/list/?series=588567&state=%2A&archive=both.
Just for completeness: we saw a similar oops in kola-gcp#704 when testing 35.20211209.20.0
.
[ 135.820793] BUG: unable to handle page fault for address: 0000000100000008
[ 135.827815] #PF: supervisor read access in kernel mode
[ 135.833063] #PF: error_code(0x0000) - not-present page
[ 135.838478] PGD 0 P4D 0
[ 135.841137] Oops: 0000 [#1] SMP PTI
[ 135.844747] CPU: 0 PID: 110 Comm: kworker/u2:2 Not tainted 5.15.6-200.fc35.x86_64 #1
[ 135.852599] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[ 135.861952] Workqueue: writeback wb_workfn (flush-8:0)
[ 135.867220] RIP: 0010:rb_next+0x21/0x50
[ 135.871167] Code: 00 00 c3 66 0f 1f 44 00 00 48 8b 17 48 39 d7 74 35 48 8b 47 08 48 85 c0 74 1c 49 89 c0 48 8b 40 10 48 85 c0 75 f4 4c 89 c0 c3 <48> 3b 78 08 75 f6 48 8b 10 48 89 c7 48 89 d0 48 83 e0 fc 49 89 c0
[ 135.890307] RSP: 0018:ffffa9b7c094b5a8 EFLAGS: 00010006
[ 135.895668] RAX: 0000000100000000 RBX: ffff8ccd8d7ef158 RCX: 0000000000000000
[ 135.903013] RDX: 0000000100000001 RSI: ffff8ccd86a66898 RDI: ffff8ccd86a66898
[ 135.910434] RBP: ffff8ccd86a66898 R08: 0000000100000000 R09: 0000000000000000
[ 135.917869] R10: 0000000000000015 R11: 0000000000000000 R12: 0000000000000000
[ 135.925217] R13: ffff8ccd86a208c0 R14: ffff8ccd86a208c0 R15: ffff8ccd8da21e00
[ 135.932463] FS: 0000000000000000(0000) GS:ffff8ccdac000000(0000) knlGS:0000000000000000
[ 135.940663] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 135.946530] CR2: 0000000100000008 CR3: 0000000106b5e004 CR4: 00000000001706f0
[ 135.953781] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 135.961056] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 135.968300] Call Trace:
[ 135.970884] <TASK>
[ 135.973103] bfq_idle_extract+0xa6/0xb0
[ 135.977095] bfq_put_idle_entity+0x12/0x60
[ 135.981312] bfq_bfqq_served+0xb0/0x1c0
[ 135.985270] bfq_dispatch_request+0x294/0x1220
[ 135.989875] ? sbitmap_get+0x86/0x190
[ 135.993841] __blk_mq_do_dispatch_sched+0x1d1/0x2e0
[ 135.998864] __blk_mq_sched_dispatch_requests+0xd8/0x130
[ 136.004306] blk_mq_sched_dispatch_requests+0x30/0x60
[ 136.009488] __blk_mq_run_hw_queue+0x2d/0x60
[ 136.013894] __blk_mq_delay_run_hw_queue+0x144/0x150
[ 136.018984] blk_mq_sched_insert_requests+0x63/0xe0
[ 136.024006] blk_mq_flush_plug_list+0xed/0x170
[ 136.028568] blk_mq_submit_bio+0x280/0x580
[ 136.032787] __submit_bio+0x1d3/0x200
[ 136.036569] ? __mod_lruvec_page_state+0x5d/0x90
[ 136.041295] ? unlock_page_memcg+0x18/0x70
[ 136.045523] submit_bio_noacct+0x245/0x280
[ 136.049750] iomap_submit_ioend+0x4e/0x80
[ 136.053883] iomap_do_writepage+0x4db/0x700
[ 136.058312] write_cache_pages+0x176/0x3c0
[ 136.062635] ? iomap_write_begin+0x4e0/0x4e0
[ 136.067021] iomap_writepages+0x1c/0x40
[ 136.070992] xfs_vm_writepages+0x6e/0x90 [xfs]
[ 136.075669] do_writepages+0xca/0x1e0
[ 136.079447] ? usleep_range+0x60/0x60
[ 136.083217] ? __schedule+0x31c/0x1500
[ 136.087071] ? fprop_fraction_percpu+0x2b/0x70
[ 136.091657] ? __wb_calc_thresh+0x2a/0x100
[ 136.095861] __writeback_single_inode+0x39/0x280
[ 136.100587] writeback_sb_inodes+0x1d8/0x440
[ 136.105065] __writeback_inodes_wb+0x4c/0xe0
[ 136.109541] wb_writeback+0x1be/0x260
[ 136.113779] wb_workfn+0x2b4/0x4a0
[ 136.117312] ? check_preempt_curr+0x55/0x70
[ 136.121633] ? ttwu_do_wakeup+0x17/0x150
[ 136.126020] process_one_work+0x1f1/0x390
[ 136.130166] worker_thread+0x53/0x3e0
[ 136.133957] ? process_one_work+0x390/0x390
[ 136.138422] kthread+0x127/0x150
[ 136.141772] ? set_kthread_struct+0x40/0x40
[ 136.146075] ret_from_fork+0x22/0x30
[ 136.149781] </TASK>
[ 136.152164] Modules linked in: xt_conntrack iptable_filter xt_MASQUERADE xt_comment iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth bridge stp llc overlay intel_rapl_msr intel_rapl_common kvm_intel kvm irqbypass snd_pcsp rapl snd_pcm snd_timer snd soundcore i2c_piix4 pvpanic_mmio pvpanic drm ip_tables xfs rfkill dm_multipath crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel virtio_net net_failover virtio_scsi serio_raw failover ipmi_devintf ipmi_msghandler fuse
[ 136.197713] CR2: 0000000100000008
[ 136.201168] ---[ end trace 3fd70f5585fa105d ]---
[ 136.205998] RIP: 0010:rb_next+0x21/0x50
[ 136.209977] Code: 00 00 c3 66 0f 1f 44 00 00 48 8b 17 48 39 d7 74 35 48 8b 47 08 48 85 c0 74 1c 49 89 c0 48 8b 40 10 48 85 c0 75 f4 4c 89 c0 c3 <48> 3b 78 08 75 f6 48 8b 10 48 89 c7 48 89 d0 48 83 e0 fc 49 89 c0
[ 136.229902] RSP: 0018:ffffa9b7c094b5a8 EFLAGS: 00010006
[ 136.235245] RAX: 0000000100000000 RBX: ffff8ccd8d7ef158 RCX: 0000000000000000
[ 136.242492] RDX: 0000000100000001 RSI: ffff8ccd86a66898 RDI: ffff8ccd86a66898
[ 136.249866] RBP: ffff8ccd86a66898 R08: 0000000100000000 R09: 0000000000000000
[ 136.257248] R10: 0000000000000015 R11: 0000000000000000 R12: 0000000000000000
[ 136.264519] R13: ffff8ccd86a208c0 R14: ffff8ccd86a208c0 R15: ffff8ccd8da21e00
[ 136.271779] FS: 0000000000000000(0000) GS:ffff8ccdac000000(0000) knlGS:0000000000000000
[ 136.280082] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 136.286084] CR2: 0000000100000008 CR3: 0000000106b5e004 CR4: 00000000001706f0
[ 136.293340] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 136.300834] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Full console output: console.txt
Saw something similar today in kola-gcp#757 with testing-devel
(35.20220120.20.0
) and kernel-5.15.10-200.fc35.x86_64
:
[ 82.862358] BUG: kernel NULL pointer dereference, address: 0000000000000000^M
[ 82.869594] #PF: supervisor read access in kernel mode^M
[ 82.874937] #PF: error_code(0x0000) - not-present page^M
[ 82.880276] PGD 0 P4D 0 ^M
[ 82.882934] Oops: 0000 [#1] SMP PTI^M
[ 82.886542] CPU: 0 PID: 110 Comm: kworker/u2:2 Not tainted 5.15.10-200.fc35.x86_64 #1^M
[ 82.894702] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011^M
[ 82.904124] Workqueue: writeback wb_workfn (flush-8:0)^M
[ 82.909389] RIP: 0010:bfq_put_idle_entity+0x1b/0x60^M
[ 82.914645] Code: 1f 44 00 00 8d 04 bf 01 c0 f7 d8 83 c0 50 c3 0f 1f 44 00 00 55 48 89 fd 53 48 89 f3 e8 2e fa ff ff 48 8b 43 68 48 83 7b 60 00 <48> 8b 00 c6 43 18 00 75 24 48 63 53 4c 48 8d bb 78 ff ff ff 48 29^M
[ 82.933715] RSP: 0000:ffffbb528094b5d0 EFLAGS: 00010082^M
[ 82.939056] RAX: 0000000000000000 RBX: ffff969fccc983d0 RCX: 0000000000000000^M
[ 82.946469] RDX: ffff969fccc98390 RSI: ffff969fccc983b0 RDI: ffff969fccc983d0^M
[ 82.953821] RBP: ffff969fcb6c9958 R08: ffff969fcb6c9960 R09: 0000000000000000^M
[ 82.961332] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000010^M
[ 82.968749] R13: ffff969fc4f5f3b0 R14: ffff969fc4f5f3b0 R15: ffff969fd21af3c0^M
[ 82.975989] FS: 0000000000000000(0000) GS:ffff969fec000000(0000) knlGS:0000000000000000^M
[ 82.984184] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M
[ 82.990120] CR2: 0000000000000000 CR3: 0000000104d1c005 CR4: 00000000001706f0^M
[ 82.997458] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000^M
[ 83.004798] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400^M
[ 83.012040] Call Trace:^M
[ 83.014711] <TASK>^M
[ 83.016922] bfq_bfqq_served+0xb0/0x1c0^M
[ 83.020986] bfq_dispatch_request+0x290/0x1230^M
[ 83.025539] ? sbitmap_get+0x86/0x190^M
[ 83.029414] __blk_mq_do_dispatch_sched+0x1d1/0x2e0^M
[ 83.034407] __blk_mq_sched_dispatch_requests+0xd8/0x130^M
[ 83.039831] blk_mq_sched_dispatch_requests+0x30/0x60^M
[ 83.044996] __blk_mq_run_hw_queue+0x2d/0x60^M
[ 83.049384] __blk_mq_delay_run_hw_queue+0x144/0x150^M
[ 83.054467] blk_mq_sched_insert_requests+0x63/0xe0^M
[ 83.059476] blk_mq_flush_plug_list+0xed/0x170^M
[ 83.064027] blk_mq_submit_bio+0x280/0x580^M
[ 83.068232] __submit_bio+0x1d3/0x200^M
[ 83.072003] ? __mod_lruvec_page_state+0x5d/0x90^M
[ 83.076790] ? unlock_page_memcg+0x18/0x70^M
[ 83.080999] submit_bio_noacct+0x245/0x280^M
[ 83.085225] iomap_submit_ioend+0x4e/0x80^M
[ 83.089349] iomap_do_writepage+0x4db/0x700^M
[ 83.093640] write_cache_pages+0x176/0x3c0^M
[ 83.097953] ? iomap_write_begin+0x4e0/0x4e0^M
[ 83.102689] iomap_writepages+0x1c/0x40^M
[ 83.106826] xfs_vm_writepages+0x6e/0x90 [xfs]^M
[ 83.111696] do_writepages+0xca/0x1e0^M
[ 83.115481] ? kblockd_mod_delayed_work_on+0x17/0x20^M
[ 83.120556] ? blk_mq_delay_run_hw_queues+0x49/0xd0^M
[ 83.125952] ? fprop_fraction_percpu+0x2b/0x70^M
[ 83.130619] ? __wb_calc_thresh+0x2a/0x100^M
[ 83.134827] __writeback_single_inode+0x39/0x280^M
[ 83.139556] writeback_sb_inodes+0x1d8/0x440^M
[ 83.143934] __writeback_inodes_wb+0x4c/0xe0^M
[ 83.148402] wb_writeback+0x1be/0x260^M
[ 83.152178] wb_workfn+0x2b4/0x4a0^M
[ 83.155776] ? check_preempt_curr+0x55/0x70^M
[ 83.160069] ? ttwu_do_wakeup+0x17/0x150^M
[ 83.164100] process_one_work+0x1f1/0x390^M
[ 83.168324] worker_thread+0x53/0x3e0^M
[ 83.172094] ? process_one_work+0x390/0x390^M
[ 83.176397] kthread+0x127/0x150^M
[ 83.179820] ? set_kthread_struct+0x40/0x40^M
[ 83.184113] ret_from_fork+0x22/0x30^M
[ 83.187802] </TASK>^M
[ 83.190097] Modules linked in: xt_conntrack iptable_filter xt_MASQUERADE xt_comment iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth bridge stp llc overlay intel_rapl_msr intel_rapl_common kvm_intel kvm snd_pcsp snd_pcm irqbypass rapl snd_timer snd soundcore i2c_piix4 pvpanic_mmio pvpanic drm ip_tables xfs rfkill dm_multipath crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel virtio_net net_failover virtio_scsi serio_raw failover ipmi_devintf ipmi_msghandler fuse^M
[ 83.234465] CR2: 0000000000000000^M
[ 83.237889] ---[ end trace 96e388edab826ab0 ]---^M
[ 83.242613] RIP: 0010:bfq_put_idle_entity+0x1b/0x60^M
[ 83.247864] Code: 1f 44 00 00 8d 04 bf 01 c0 f7 d8 83 c0 50 c3 0f 1f 44 00 00 55 48 89 fd 53 48 89 f3 e8 2e fa ff ff 48 8b 43 68 48 83 7b 60 00 <48> 8b 00 c6 43 18 00 75 24 48 63 53 4c 48 8d bb 78 ff ff ff 48 29^M
[ 83.266916] RSP: 0000:ffffbb528094b5d0 EFLAGS: 00010082^M
[ 83.272334] RAX: 0000000000000000 RBX: ffff969fccc983d0 RCX: 0000000000000000^M
[ 83.279621] RDX: ffff969fccc98390 RSI: ffff969fccc983b0 RDI: ffff969fccc983d0^M
[ 83.286871] RBP: ffff969fcb6c9958 R08: ffff969fcb6c9960 R09: 0000000000000000^M
[ 83.294208] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000010^M
[ 83.301466] R13: ffff969fc4f5f3b0 R14: ffff969fc4f5f3b0 R15: ffff969fd21af3c0^M
[ 83.308790] FS: 0000000000000000(0000) GS:ffff969fec000000(0000) knlGS:0000000000000000^M
[ 83.316985] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M
[ 83.322880] CR2: 0000000000000000 CR3: 0000000104d1c005 CR4: 00000000001706f0^M
[ 83.330208] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000^M
[ 83.337450] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400^M
Full console output: console.txt
Still seeing some ooops on GCP - Is there any way to tell the status of that kernel patchset?
I think the patchset is still being reviewed; I think I found v1 of the set - https://lore.kernel.org/all/20211223171425.3551-1-jack@suse.cz/
And now it appears to be up to v5 - https://lkml.kernel.org/linux-block/20220121105503.14069-1-jack@suse.cz/
(Not a kernel developer, so navigating patchsets on LKML is not my forte)
Noting that I havne't seen this in a while.
Saw a kernel oops on GCP today in kola-gcp#46 on kernel-5.16.13-200.fc35.x86_64
:
[ 80.370395] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 80.377502] #PF: supervisor read access in kernel mode
[ 80.382747] #PF: error_code(0x0000) - not-present page
[ 80.388040] PGD 0 P4D 0
[ 80.390718] Oops: 0000 [#1] PREEMPT SMP PTI
[ 80.395032] CPU: 0 PID: 254 Comm: kworker/u2:3 Not tainted 5.16.13-200.fc35.x86_64 #1
[ 80.402967] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[ 80.412288] Workqueue: writeback wb_workfn (flush-8:0)
[ 80.417716] RIP: 0010:rb_erase+0x106/0x350
[ 80.421973] Code: 00 00 48 89 10 48 8b 02 a8 01 0f 84 35 02 00 00 48 83 e0 fc 75 01 c3 48 89 d1 48 89 c2 48 8b 42 08 48 39 c8 75 af 48 8b 42 10 <f6> 00 01 0f 84 1d 01 00 00 48 8b 70 10 48 85 f6 74 05 f6 06 01 74
[ 80.441055] RSP: 0018:ffffb755005cb608 EFLAGS: 00010046
[ 80.446545] RAX: 0000000000000000 RBX: ffff9adbcbae3158 RCX: 0000000000000000
[ 80.453794] RDX: ffff9adbc3473098 RSI: 0000000000000000 RDI: ffff9adbd0f6b898
[ 80.461041] RBP: ffff9adbd0f6b898 R08: ffff9adbcbae3160 R09: 0000000000000000
[ 80.468409] R10: 0000000000000012 R11: 0000000000000000 R12: ffff9adbd0f6b810
[ 80.475847] R13: ffff9adbd27fb180 R14: ffff9adbcd543420 R15: ffff9adbd0f02d00
[ 80.483086] FS: 0000000000000000(0000) GS:ffff9adbec000000(0000) knlGS:0000000000000000
[ 80.491278] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 80.497130] CR2: 0000000000000000 CR3: 0000000103696004 CR4: 00000000001706f0
[ 80.504385] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 80.511709] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 80.518959] Call Trace:
[ 80.521613] <TASK>
[ 80.523821] bfq_idle_extract+0x40/0xb0
[ 80.527769] bfq_put_idle_entity+0x12/0x60
[ 80.531989] bfq_bfqq_served+0xb0/0x1a0
[ 80.535949] bfq_dispatch_request+0x2fa/0x12b0
[ 80.540506] ? sbitmap_get+0x86/0x190
[ 80.544276] __blk_mq_do_dispatch_sched+0x1d1/0x300
[ 80.549277] __blk_mq_sched_dispatch_requests+0xd6/0x130
[ 80.554697] blk_mq_sched_dispatch_requests+0x30/0x60
[ 80.559853] __blk_mq_run_hw_queue+0x30/0xa0
[ 80.564244] __blk_mq_delay_run_hw_queue+0x182/0x1b0
[ 80.569316] blk_mq_sched_insert_requests+0x68/0xf0
[ 80.574306] blk_mq_flush_plug_list+0x196/0x300
[ 80.578999] blk_mq_submit_bio+0x305/0x780
[ 80.583204] submit_bio_noacct+0x2a5/0x2c0
[ 80.587425] iomap_submit_ioend+0x4e/0x80
[ 80.591543] iomap_do_writepage+0x4d4/0x810
[ 80.595843] write_cache_pages+0x156/0x390
[ 80.600051] ? iomap_truncate_page+0x40/0x40
[ 80.604430] iomap_writepages+0x1c/0x40
[ 80.608372] xfs_vm_writepages+0x6c/0x90 [xfs]
[ 80.613040] do_writepages+0xbf/0x1b0
[ 80.616811] ? blkcg_rstat_flush+0x28/0x1f0
[ 80.621105] ? fprop_fraction_percpu+0x2b/0x70
[ 80.625754] ? __wb_calc_thresh+0x2a/0x100
[ 80.629976] __writeback_single_inode+0x3d/0x310
[ 80.634702] writeback_sb_inodes+0x1d4/0x450
[ 80.639083] __writeback_inodes_wb+0x4c/0xe0
[ 80.643459] wb_writeback+0x1c9/0x2a0
[ 80.647229] wb_workfn+0x2c3/0x4e0
[ 80.650738] ? check_preempt_curr+0x55/0x70
[ 80.655134] ? ttwu_do_wakeup+0x17/0x160
[ 80.659192] process_one_work+0x1e8/0x3c0
[ 80.663311] worker_thread+0x50/0x3b0
[ 80.667179] ? rescuer_thread+0x370/0x370
[ 80.671311] kthread+0x16b/0x190
[ 80.674650] ? set_kthread_struct+0x40/0x40
[ 80.678939] ret_from_fork+0x22/0x30
[ 80.682631] </TASK>
[ 80.684932] Modules linked in: xt_conntrack iptable_filter xt_MASQUERADE xt_comment iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth bridge stp llc overlay intel_rapl_msr intel_rapl_common kvm_intel kvm snd_pcsp irqbypass snd_pcm rapl snd_timer snd soundcore i2c_piix4 pvpanic_mmio pvpanic drm ip_tables vfat fat xfs rfkill dm_multipath crct10dif_pclmul crc32_pclmul crc32c_intel virtio_net ghash_clmulni_intel net_failover virtio_scsi serio_raw failover ipmi_devintf ipmi_msghandler fuse
[ 80.729456] CR2: 0000000000000000
[ 80.732877] ---[ end trace 900d3a15c2a8ae78 ]---
[ 80.737599] RIP: 0010:rb_erase+0x106/0x350
[ 80.741804] Code: 00 00 48 89 10 48 8b 02 a8 01 0f 84 35 02 00 00 48 83 e0 fc 75 01 c3 48 89 d1 48 89 c2 48 8b 42 08 48 39 c8 75 af 48 8b 42 10 <f6> 00 01 0f 84 1d 01 00 00 48 8b 70 10 48 85 f6 74 05 f6 06 01 74
[ 80.760767] RSP: 0018:ffffb755005cb608 EFLAGS: 00010046
[ 80.766113] RAX: 0000000000000000 RBX: ffff9adbcbae3158 RCX: 0000000000000000
[ 80.773439] RDX: ffff9adbc3473098 RSI: 0000000000000000 RDI: ffff9adbd0f6b898
[ 80.780677] RBP: ffff9adbd0f6b898 R08: ffff9adbcbae3160 R09: 0000000000000000
[ 80.787916] R10: 0000000000000012 R11: 0000000000000000 R12: ffff9adbd0f6b810
[ 80.795161] R13: ffff9adbd27fb180 R14: ffff9adbcd543420 R15: ffff9adbd0f02d00
[ 80.802413] FS: 0000000000000000(0000) GS:ffff9adbec000000(0000) knlGS:0000000000000000
[ 80.810696] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 80.816555] CR2: 0000000000000000 CR3: 0000000103696004 CR4: 00000000001706f0
[ 80.823797] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 80.831049] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 80.838289] note: kworker/u2:3[254] exited with preempt_count 2
Full console output: console.txt
Saw this again today in kola-gcp#56 and kola-gcp#57.
Saw this again today in kola-gcp#100 with 35.20220406.20.0
and kernel-5.16.18-200.fc35.x86_64
FTR, current patch series is https://lore.kernel.org/all/20220401102325.17617-1-jack@suse.cz/
We haven't seen this in some time. Maybe fixes landed upstream? Closing..
We haven't seen this in some time. Maybe fixes landed upstream? Closing..
Yep, they're in 5.18+.
A kernel panic has been sporadically observed by our testsuite on GCP. One failed run is at https://jenkins-fedora-coreos.apps.ocp.ci.centos.org/job/kola-gcp/671/, on testing-devel build
35.20211111.20.0
withkernel-5.14.16-301.fc35.x86_64
. Full console output is attached here.Stacktrace is:
This happened in
podman.network-single
, which runs this: