coreos / fedora-coreos-tracker

Issue tracker for Fedora CoreOS
https://fedoraproject.org/coreos/
264 stars 60 forks source link

kernel: NULL pointer dereference in rb_insert_color, coming from bfq_deactivate_entity #1021

Closed lucab closed 1 year ago

lucab commented 2 years ago

A kernel panic has been sporadically observed by our testsuite on GCP. One failed run is at https://jenkins-fedora-coreos.apps.ocp.ci.centos.org/job/kola-gcp/671/, on testing-devel build 35.20211111.20.0 with kernel-5.14.16-301.fc35.x86_64. Full console output is attached here.

Stacktrace is:

[  111.996284] BUG: kernel NULL pointer dereference, address: 0000000000000008
[  112.003402] #PF: supervisor read access in kernel mode
[  112.008765] #PF: error_code(0x0000) - not-present page
[  112.014027] PGD 0 P4D 0 
[  112.016699] Oops: 0000 [#1] SMP PTI
[  112.020308] CPU: 0 PID: 101 Comm: kworker/0:1H Not tainted 5.14.16-301.fc35.x86_64 #1
[  112.028261] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[  112.037691] Workqueue: kblockd blk_mq_run_work_fn
[  112.042515] RIP: 0010:rb_insert_color+0x14/0x120
[  112.047250] Code: c0 75 eb 4c 89 c0 c3 45 31 c0 eb f7 66 2e 0f 1f 84 00 00 00 00 00 48 8b 07 48 85 c0 0f 84 b3 00 00 00 48 8b 10 f6 c2 01 75 5b <48> 8b 4a 08 48 39 c1 74 53 48 85 c9 74 05 f6 01 01 74 72 48 8b 48
[  112.066240] RSP: 0000:ffffb8fb0092bbf0 EFLAGS: 00010046
[  112.071603] RAX: ffff8f0b92e47098 RBX: ffff8f0b94179898 RCX: 0000000000000000
[  112.079197] RDX: 0000000000000000 RSI: ffff8f0b8b704960 RDI: ffff8f0b94179898
[  112.090358] RBP: 0000000000000000 R08: ffff8f0b8b6f2e50 R09: ffff8f0b8b6f2e50
[  112.097625] R10: 0000000000000031 R11: 0000000000000000 R12: ffff8f0b8b704958
[  112.104961] R13: 0000000000000001 R14: ffff8f0b8b704960 R15: ffff8f0b94179898
[  112.112235] FS:  0000000000000000(0000) GS:ffff8f0bac000000(0000) knlGS:0000000000000000
[  112.120564] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  112.127116] CR2: 0000000000000008 CR3: 00000001020f2006 CR4: 00000000001706f0
[  112.134523] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  112.141782] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  112.149034] Call Trace:
[  112.151609]  __bfq_deactivate_entity+0x19b/0x240
[  112.157340]  bfq_deactivate_entity+0x4f/0xc0
[  112.161732]  bfq_del_bfqq_busy+0xc9/0x180
[  112.165862]  __bfq_bfqq_expire+0x95/0xc0
[  112.169904]  bfq_bfqq_expire+0x3bd/0x9a0
[  112.173949]  ? bfq_may_expire_for_budg_timeout+0x7f/0x1b0
[  112.179471]  ? bfq_active_extract+0x8e/0x140
[  112.184078]  ? bfq_bfqq_served+0xb0/0x1c0
[  112.188266]  bfq_dispatch_request+0x3fd/0x1220
[  112.193104]  ? sbitmap_get+0x86/0x190
[  112.197043]  __blk_mq_do_dispatch_sched+0x1d1/0x2e0
[  112.202053]  __blk_mq_sched_dispatch_requests+0xd8/0x130
[  112.208354]  blk_mq_sched_dispatch_requests+0x30/0x60
[  112.213642]  __blk_mq_run_hw_queue+0x2d/0x60
[  112.218132]  process_one_work+0x1ec/0x390
[  112.222271]  worker_thread+0x53/0x3e0
[  112.226197]  ? process_one_work+0x390/0x390
[  112.230494]  kthread+0x127/0x150
[  112.233962]  ? set_kthread_struct+0x40/0x40
[  112.238264]  ret_from_fork+0x22/0x30
[  112.241980] Modules linked in: xt_conntrack iptable_filter xt_MASQUERADE xt_comment iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth bridge stp llc overlay intel_rapl_msr intel_rapl_common kvm_intel kvm irqbypass snd_pcsp snd_pcm rapl snd_timer snd soundcore i2c_piix4 pvpanic_mmio pvpanic drm ip_tables xfs rfkill dm_multipath crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel virtio_net net_failover virtio_scsi serio_raw failover ipmi_devintf ipmi_msghandler fuse
[  112.287124] CR2: 0000000000000008
[  112.290661] ---[ end trace 07deb3865fa00f77 ]---
[  112.295392] RIP: 0010:rb_insert_color+0x14/0x120
[  112.300142] Code: c0 75 eb 4c 89 c0 c3 45 31 c0 eb f7 66 2e 0f 1f 84 00 00 00 00 00 48 8b 07 48 85 c0 0f 84 b3 00 00 00 48 8b 10 f6 c2 01 75 5b <48> 8b 4a 08 48 39 c1 74 53 48 85 c9 74 05 f6 01 01 74 72 48 8b 48
[  112.319474] RSP: 0000:ffffb8fb0092bbf0 EFLAGS: 00010046
[  112.324818] RAX: ffff8f0b92e47098 RBX: ffff8f0b94179898 RCX: 0000000000000000
[  112.332772] RDX: 0000000000000000 RSI: ffff8f0b8b704960 RDI: ffff8f0b94179898
[  112.341241] RBP: 0000000000000000 R08: ffff8f0b8b6f2e50 R09: ffff8f0b8b6f2e50
[  112.348516] R10: 0000000000000031 R11: 0000000000000000 R12: ffff8f0b8b704958
[  112.355865] R13: 0000000000000001 R14: ffff8f0b8b704960 R15: ffff8f0b94179898
[  112.363118] FS:  0000000000000000(0000) GS:ffff8f0bac000000(0000) knlGS:0000000000000000
[  112.371530] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  112.377466] CR2: 0000000000000008 CR3: 00000001020f2006 CR4: 00000000001706f0
[  112.384726] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  112.391999] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

This happened in podman.network-single, which runs this:

"for i in $(seq 1 100); do\n\t\techo -n \"$i: \"\n\t\tsudo podman run --rm ping sh -c 'ping -i 0.2 10.88.0.1 -w 1 >/dev/null && echo PASS || echo FAIL'\n\tdone"
lucab commented 2 years ago

Same kernel also hit another GPF the day before, and it seems to be coming from the same BFQ logic:

[   71.653536] general protection fault, probably for non-canonical address 0x1010006000001bd: 0000 [#1] SMP PTI
[   71.663618] CPU: 0 PID: 253 Comm: kworker/u2:3 Not tainted 5.14.16-301.fc35.x86_64 #1
[   71.671643] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[   71.681067] Workqueue: writeback wb_workfn (flush-8:0)
[   71.686354] RIP: 0010:__bfq_deactivate_entity+0x15c/0x240
[   71.691950] Code: 48 2b 41 28 48 85 c0 7e 05 49 89 5c 24 18 49 8b 44 24 08 4d 8d 74 24 08 48 85 c0 0f 84 d6 00 00 00 48 8b 7b 28 eb 03 48 89 c8 <48> 8b 48 28 48 8d 70 10 48 8d 50 08 48 29 f9 48 85 c9 48 0f 4f d6
[   71.711025] RSP: 0018:ffffb656c015b538 EFLAGS: 00010006
[   71.716374] RAX: 0101000600000195 RBX: ffff9bea12d6e098 RCX: 0101000600000195
[   71.723628] RDX: ffff9bea0b4453c8 RSI: ffff9bea0b4453d0 RDI: 00000009f8ea36ed
[   71.730961] RBP: 0000000000000000 R08: ffff9bea1211b4e0 R09: ffff9bea1211b4e0
[   71.738217] R10: 0000000000000026 R11: 0000000000000000 R12: ffff9bea0b70f958
[   71.745470] R13: 0000000000000001 R14: ffff9bea0b70f960 R15: ffff9bea12d6e098
[   71.752715] FS:  0000000000000000(0000) GS:ffff9bea2c000000(0000) knlGS:0000000000000000
[   71.760916] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   71.766771] CR2: 000056063b242000 CR3: 0000000112f2a004 CR4: 00000000001706f0
[   71.774032] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   71.781281] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   71.788521] Call Trace:
[   71.791091]  bfq_deactivate_entity+0x4f/0xc0
[   71.795484]  bfq_del_bfqq_busy+0xc9/0x180
[   71.799604]  __bfq_bfqq_expire+0x95/0xc0
[   71.803644]  bfq_bfqq_expire+0x3bd/0x9a0
[   71.807689]  ? bfq_may_expire_for_budg_timeout+0x7f/0x1b0
[   71.813213]  ? bfq_active_extract+0x8e/0x140
[   71.817592]  ? bfq_bfqq_served+0xb0/0x1c0
[   71.821707]  bfq_dispatch_request+0x3fd/0x1220
[   71.826268]  ? sbitmap_get+0x86/0x190
[   71.830052]  __blk_mq_do_dispatch_sched+0x1d1/0x2e0
[   71.835047]  __blk_mq_sched_dispatch_requests+0xd8/0x130
[   71.840492]  blk_mq_sched_dispatch_requests+0x30/0x60
[   71.845654]  __blk_mq_run_hw_queue+0x2d/0x60
[   71.850031]  __blk_mq_delay_run_hw_queue+0x144/0x150
[   71.855104]  blk_mq_sched_insert_requests+0x63/0xe0
[   71.860104]  blk_mq_flush_plug_list+0xed/0x170
[   71.864827]  blk_mq_submit_bio+0x280/0x580
[   71.869030]  submit_bio_noacct+0x410/0x4e0
[   71.873246]  ? unlock_page_memcg+0x18/0x70
[   71.877453]  iomap_submit_ioend+0x4e/0x80
[   71.881574]  iomap_do_writepage+0x455/0x730
[   71.885927]  write_cache_pages+0x173/0x3b0
[   71.890148]  ? iomap_write_begin+0x460/0x460
[   71.894557]  iomap_writepages+0x1c/0x40
[   71.898503]  xfs_vm_writepages+0x6e/0x90 [xfs]
[   71.903261]  do_writepages+0x31/0xb0
[   71.906945]  ? __wb_calc_thresh+0x2a/0x100
[   71.911245]  ? wb_calc_thresh+0x41/0x50
[   71.915205]  __writeback_single_inode+0x39/0x280
[   71.919957]  writeback_sb_inodes+0x1d8/0x440
[   71.924335]  __writeback_inodes_wb+0x4c/0xe0
[   71.928715]  wb_writeback+0x1da/0x280
[   71.932495]  wb_workfn+0x2b4/0x4a0
[   71.936025]  ? check_preempt_curr+0x55/0x70
[   71.940334]  ? ttwu_do_wakeup+0x17/0x150
[   71.944370]  process_one_work+0x1ec/0x390
[   71.948501]  worker_thread+0x53/0x3e0
[   71.952274]  ? process_one_work+0x390/0x390
[   71.956652]  kthread+0x127/0x150
[   71.959999]  ? set_kthread_struct+0x40/0x40
[   71.964293]  ret_from_fork+0x22/0x30
[   71.967984] Modules linked in: xt_conntrack iptable_filter xt_MASQUERADE xt_comment iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth bridge stp llc overlay intel_rapl_msr intel_rapl_common kvm_intel kvm irqbypass snd_pcsp snd_pcm rapl snd_timer snd soundcore i2c_piix4 pvpanic_mmio pvpanic drm ip_tables xfs rfkill dm_multipath crct10dif_pclmul crc32_pclmul crc32c_intel virtio_net ghash_clmulni_intel net_failover failover serio_raw virtio_scsi ipmi_devintf ipmi_msghandler fuse
[   72.011990] ---[ end trace 34d26a14750edabd ]---
[   72.016730] RIP: 0010:__bfq_deactivate_entity+0x15c/0x240
[   72.022354] Code: 48 2b 41 28 48 85 c0 7e 05 49 89 5c 24 18 49 8b 44 24 08 4d 8d 74 24 08 48 85 c0 0f 84 d6 00 00 00 48 8b 7b 28 eb 03 48 89 c8 <48> 8b 48 28 48 8d 70 10 48 8d 50 08 48 29 f9 48 85 c9 48 0f 4f d6
[   72.041234] RSP: 0018:ffffb656c015b538 EFLAGS: 00010006
[   72.046878] RAX: 0101000600000195 RBX: ffff9bea12d6e098 RCX: 0101000600000195
[   72.054124] RDX: ffff9bea0b4453c8 RSI: ffff9bea0b4453d0 RDI: 00000009f8ea36ed
[   72.061373] RBP: 0000000000000000 R08: ffff9bea1211b4e0 R09: ffff9bea1211b4e0
[   72.068629] R10: 0000000000000026 R11: 0000000000000000 R12: ffff9bea0b70f958
[   72.075883] R13: 0000000000000001 R14: ffff9bea0b70f960 R15: ffff9bea12d6e098
[   72.083248] FS:  0000000000000000(0000) GS:ffff9bea2c000000(0000) knlGS:0000000000000000
[   72.091462] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   72.097335] CR2: 000056063b242000 CR3: 0000000112f2a004 CR4: 00000000001706f0
[   72.104682] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   72.113512] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
lucab commented 2 years ago

Forwarded to the Fedora bugzilla, https://bugzilla.redhat.com/show_bug.cgi?id=2022819.

miabbott commented 2 years ago

Mindlessly searching for the GPF with bfq_deactivate_entity hit on https://lkml.org/lkml/2020/10/9/1120

Vogtinator commented 2 years ago

openQA found that as well on openSUSE Tumbleweed, both in 5.14.14 and 5.15.2: https://bugzilla.opensuse.org/show_bug.cgi?id=1192714

In most of the failure cases there's no GPF or NULL deref message, the system just freezes completely (cursor stops blinking).

I was able to get a KASAN use-after-free report (also in the oS bug report):

KASAN report

``` [ 235.949241] ================================================================== [ 235.950326] BUG: KASAN: use-after-free in __bfq_deactivate_entity+0x9cb/0xa50 [ 235.951369] Read of size 8 at addr ffff88800693c0c0 by task runc:[2:INIT]/10544 [ 235.953476] CPU: 0 PID: 10544 Comm: runc:[2:INIT] Tainted: G E 5.15.2-0.g5fb85fd-default #1 openSUSE Tumbleweed (unreleased) f1f3b891c72369aebecd2e43e4641a6358867c70 [ 235.955726] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014 [ 235.958007] Call Trace: [ 235.959157] [ 235.960287] dump_stack_lvl+0x46/0x5a [ 235.961412] print_address_description.constprop.0+0x1f/0x140 [ 235.962556] ? __bfq_deactivate_entity+0x9cb/0xa50 [ 235.963707] kasan_report.cold+0x7f/0x11b [ 235.964841] ? __bfq_deactivate_entity+0x9cb/0xa50 [ 235.965970] __bfq_deactivate_entity+0x9cb/0xa50 [ 235.967092] ? update_curr+0x32f/0x5d0 [ 235.968227] bfq_deactivate_entity+0xa0/0x1d0 [ 235.969365] bfq_del_bfqq_busy+0x28a/0x420 [ 235.970481] ? resched_curr+0x116/0x1d0 [ 235.971573] ? bfq_requeue_bfqq+0x70/0x70 [ 235.972657] ? check_preempt_wakeup+0x52b/0xbc0 [ 235.973748] __bfq_bfqq_expire+0x1a2/0x270 [ 235.974822] bfq_bfqq_expire+0xd16/0x2160 [ 235.975893] ? try_to_wake_up+0x4ee/0x1260 [ 235.976965] ? bfq_end_wr_async_queues+0xe0/0xe0 [ 235.978039] ? _raw_write_unlock_bh+0x60/0x60 [ 235.979105] ? _raw_spin_lock_irq+0x81/0xe0 [ 235.980162] bfq_idle_slice_timer+0x109/0x280 [ 235.981199] ? bfq_dispatch_request+0x4870/0x4870 [ 235.982220] __hrtimer_run_queues+0x37d/0x700 [ 235.983242] ? enqueue_hrtimer+0x1b0/0x1b0 [ 235.984278] ? kvm_clock_get_cycles+0xd/0x10 [ 235.985301] ? ktime_get_update_offsets_now+0x6f/0x280 [ 235.986317] hrtimer_interrupt+0x2c8/0x740 [ 235.987321] __sysvec_apic_timer_interrupt+0xcd/0x260 [ 235.988357] sysvec_apic_timer_interrupt+0x6a/0x90 [ 235.989373] [ 235.990355] asm_sysvec_apic_timer_interrupt+0x12/0x20 [ 235.991366] RIP: 0010:do_seccomp+0x4f5/0x1f40 [ 235.992376] Code: 00 fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 cb 14 00 00 48 8b bd d8 0b 00 00 c6 07 00 0f 1f 40 00 fb 66 0f 1f 44 00 00 <8b> 4c 24 30 85 c9 0f 85 06 07 00 00 8b 54 24 04 85 d2 74 19 4d 85 [ 235.994481] RSP: 0018:ffffc900020cfd48 EFLAGS: 00000246 [ 235.995546] RAX: dffffc0000000000 RBX: 1ffff92000419fb1 RCX: ffffffffb9a8d89d [ 235.996638] RDX: 1ffff1100080f17b RSI: 0000000000000008 RDI: ffff888008c56040 [ 235.997717] RBP: ffff888004078000 R08: 0000000000000001 R09: ffff88800407800f [ 235.998784] R10: ffffed100080f001 R11: 0000000000000001 R12: 00000000ffffffff [ 235.999852] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 236.000906] ? do_seccomp+0xfed/0x1f40 [ 236.001937] ? do_seccomp+0xfed/0x1f40 [ 236.002938] ? get_nth_filter+0x2e0/0x2e0 [ 236.003932] ? security_task_prctl+0x66/0xd0 [ 236.004910] __do_sys_prctl+0x420/0xd60 [ 236.005842] ? handle_mm_fault+0x196/0x610 [ 236.006739] ? __ia32_compat_sys_getrusage+0x90/0x90 [ 236.007611] ? up_read+0x15/0x90 [ 236.008477] do_syscall_64+0x5c/0x80 [ 236.009349] ? exc_page_fault+0x60/0xc0 [ 236.010219] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 236.011094] RIP: 0033:0x561fa9ceec6a [ 236.011976] Code: e8 db 46 f8 ff 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 4c 8b 54 24 28 4c 8b 44 24 30 4c 8b 4c 24 38 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 20 48 c7 44 24 40 ff ff ff ff 48 c7 44 24 48 [ 236.013823] RSP: 002b:000000c000116e38 EFLAGS: 00000216 ORIG_RAX: 000000000000009d [ 236.014778] RAX: ffffffffffffffda RBX: 000000c000028000 RCX: 0000561fa9ceec6a [ 236.015748] RDX: 000000c000116ee0 RSI: 0000000000000002 RDI: 0000000000000016 [ 236.016716] RBP: 000000c000116e90 R08: 0000000000000000 R09: 0000000000000000 [ 236.017685] R10: 0000000000000000 R11: 0000000000000216 R12: 00000000000000b8 [ 236.018645] R13: 00000000000000b7 R14: 0000000000000200 R15: 0000000000000004 [ 236.020558] Allocated by task 485: [ 236.021511] kasan_save_stack+0x1b/0x40 [ 236.022460] __kasan_kmalloc+0xa4/0xd0 [ 236.023410] bfq_pd_alloc+0xa8/0x170 [ 236.024351] blkg_alloc+0x397/0x540 [ 236.025287] blkg_create+0x66b/0xcd0 [ 236.026219] bio_associate_blkg_from_css+0x43c/0xb20 [ 236.027161] bio_associate_blkg+0x66/0x100 [ 236.028098] submit_extent_page+0x744/0x1380 [btrfs] [ 236.029126] __extent_writepage_io+0x605/0xaa0 [btrfs] [ 236.030113] __extent_writepage+0x360/0x740 [btrfs] [ 236.031093] extent_write_cache_pages+0x5a7/0xa50 [btrfs] [ 236.032084] extent_writepages+0xcb/0x1a0 [btrfs] [ 236.033063] do_writepages+0x188/0x720 [ 236.033997] filemap_fdatawrite_wbc+0x19f/0x2b0 [ 236.034929] filemap_fdatawrite_range+0x99/0xd0 [ 236.035855] btrfs_fdatawrite_range+0x46/0xf0 [btrfs] [ 236.036833] start_ordered_ops.constprop.0+0xb6/0x110 [btrfs] [ 236.037803] btrfs_sync_file+0x1bf/0xe70 [btrfs] [ 236.038747] __x64_sys_fsync+0x51/0x80 [ 236.039622] do_syscall_64+0x5c/0x80 [ 236.040468] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 236.042137] Freed by task 10561: [ 236.042966] kasan_save_stack+0x1b/0x40 [ 236.043802] kasan_set_track+0x1c/0x30 [ 236.044628] kasan_set_free_info+0x20/0x30 [ 236.045437] __kasan_slab_free+0x10b/0x140 [ 236.046256] slab_free_freelist_hook+0x8e/0x180 [ 236.047081] kfree+0xc7/0x400 [ 236.047907] blkg_free.part.0+0x78/0xf0 [ 236.048736] rcu_do_batch+0x365/0x1280 [ 236.049558] rcu_core+0x493/0x8d0 [ 236.050376] __do_softirq+0x18e/0x544 [ 236.051992] The buggy address belongs to the object at ffff88800693c000 which belongs to the cache kmalloc-2k of size 2048 [ 236.053672] The buggy address is located 192 bytes inside of 2048-byte region [ffff88800693c000, ffff88800693c800) [ 236.055328] The buggy address belongs to the page: [ 236.056136] page:00000000544d2d6e refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x6938 [ 236.056954] head:00000000544d2d6e order:3 compound_mapcount:0 compound_pincount:0 [ 236.057764] flags: 0xfffffc0010200(slab|head|node=0|zone=1|lastcpupid=0x1fffff) [ 236.058588] raw: 000fffffc0010200 dead000000000100 dead000000000122 ffff888001042000 [ 236.059439] raw: 0000000000000000 0000000000080008 00000001ffffffff 0000000000000000 [ 236.060293] page dumped because: kasan: bad access detected [ 236.062000] Memory state around the buggy address: [ 236.062862] ffff88800693bf80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 236.063756] ffff88800693c000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 236.064645] >ffff88800693c080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 236.065525] ^ [ 236.066412] ffff88800693c100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 236.067333] ffff88800693c180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 236.068240] ================================================================== [ 236.069174] Disabling lock debugging due to kernel taint ```

Vogtinator commented 2 years ago

FYI, the result of the investigation so far is https://lore.kernel.org/all/20211125172809.GC19572@quack2.suse.cz/

lucab commented 2 years ago

Thanks to the openSUSE folks, there is an initial patch going through review at https://patchwork.kernel.org/project/linux-block/list/?series=588567&state=%2A&archive=both.

dustymabe commented 2 years ago

Just for completeness: we saw a similar oops in kola-gcp#704 when testing 35.20211209.20.0.

[  135.820793] BUG: unable to handle page fault for address: 0000000100000008
[  135.827815] #PF: supervisor read access in kernel mode
[  135.833063] #PF: error_code(0x0000) - not-present page
[  135.838478] PGD 0 P4D 0 
[  135.841137] Oops: 0000 [#1] SMP PTI
[  135.844747] CPU: 0 PID: 110 Comm: kworker/u2:2 Not tainted 5.15.6-200.fc35.x86_64 #1
[  135.852599] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[  135.861952] Workqueue: writeback wb_workfn (flush-8:0)
[  135.867220] RIP: 0010:rb_next+0x21/0x50
[  135.871167] Code: 00 00 c3 66 0f 1f 44 00 00 48 8b 17 48 39 d7 74 35 48 8b 47 08 48 85 c0 74 1c 49 89 c0 48 8b 40 10 48 85 c0 75 f4 4c 89 c0 c3 <48> 3b 78 08 75 f6 48 8b 10 48 89 c7 48 89 d0 48 83 e0 fc 49 89 c0
[  135.890307] RSP: 0018:ffffa9b7c094b5a8 EFLAGS: 00010006
[  135.895668] RAX: 0000000100000000 RBX: ffff8ccd8d7ef158 RCX: 0000000000000000
[  135.903013] RDX: 0000000100000001 RSI: ffff8ccd86a66898 RDI: ffff8ccd86a66898
[  135.910434] RBP: ffff8ccd86a66898 R08: 0000000100000000 R09: 0000000000000000
[  135.917869] R10: 0000000000000015 R11: 0000000000000000 R12: 0000000000000000
[  135.925217] R13: ffff8ccd86a208c0 R14: ffff8ccd86a208c0 R15: ffff8ccd8da21e00
[  135.932463] FS:  0000000000000000(0000) GS:ffff8ccdac000000(0000) knlGS:0000000000000000
[  135.940663] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  135.946530] CR2: 0000000100000008 CR3: 0000000106b5e004 CR4: 00000000001706f0
[  135.953781] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  135.961056] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  135.968300] Call Trace:
[  135.970884]  <TASK>
[  135.973103]  bfq_idle_extract+0xa6/0xb0
[  135.977095]  bfq_put_idle_entity+0x12/0x60
[  135.981312]  bfq_bfqq_served+0xb0/0x1c0
[  135.985270]  bfq_dispatch_request+0x294/0x1220
[  135.989875]  ? sbitmap_get+0x86/0x190
[  135.993841]  __blk_mq_do_dispatch_sched+0x1d1/0x2e0
[  135.998864]  __blk_mq_sched_dispatch_requests+0xd8/0x130
[  136.004306]  blk_mq_sched_dispatch_requests+0x30/0x60
[  136.009488]  __blk_mq_run_hw_queue+0x2d/0x60
[  136.013894]  __blk_mq_delay_run_hw_queue+0x144/0x150
[  136.018984]  blk_mq_sched_insert_requests+0x63/0xe0
[  136.024006]  blk_mq_flush_plug_list+0xed/0x170
[  136.028568]  blk_mq_submit_bio+0x280/0x580
[  136.032787]  __submit_bio+0x1d3/0x200
[  136.036569]  ? __mod_lruvec_page_state+0x5d/0x90
[  136.041295]  ? unlock_page_memcg+0x18/0x70
[  136.045523]  submit_bio_noacct+0x245/0x280
[  136.049750]  iomap_submit_ioend+0x4e/0x80
[  136.053883]  iomap_do_writepage+0x4db/0x700
[  136.058312]  write_cache_pages+0x176/0x3c0
[  136.062635]  ? iomap_write_begin+0x4e0/0x4e0
[  136.067021]  iomap_writepages+0x1c/0x40
[  136.070992]  xfs_vm_writepages+0x6e/0x90 [xfs]
[  136.075669]  do_writepages+0xca/0x1e0
[  136.079447]  ? usleep_range+0x60/0x60
[  136.083217]  ? __schedule+0x31c/0x1500
[  136.087071]  ? fprop_fraction_percpu+0x2b/0x70
[  136.091657]  ? __wb_calc_thresh+0x2a/0x100
[  136.095861]  __writeback_single_inode+0x39/0x280
[  136.100587]  writeback_sb_inodes+0x1d8/0x440
[  136.105065]  __writeback_inodes_wb+0x4c/0xe0
[  136.109541]  wb_writeback+0x1be/0x260
[  136.113779]  wb_workfn+0x2b4/0x4a0
[  136.117312]  ? check_preempt_curr+0x55/0x70
[  136.121633]  ? ttwu_do_wakeup+0x17/0x150
[  136.126020]  process_one_work+0x1f1/0x390
[  136.130166]  worker_thread+0x53/0x3e0
[  136.133957]  ? process_one_work+0x390/0x390
[  136.138422]  kthread+0x127/0x150
[  136.141772]  ? set_kthread_struct+0x40/0x40
[  136.146075]  ret_from_fork+0x22/0x30
[  136.149781]  </TASK>
[  136.152164] Modules linked in: xt_conntrack iptable_filter xt_MASQUERADE xt_comment iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth bridge stp llc overlay intel_rapl_msr intel_rapl_common kvm_intel kvm irqbypass snd_pcsp rapl snd_pcm snd_timer snd soundcore i2c_piix4 pvpanic_mmio pvpanic drm ip_tables xfs rfkill dm_multipath crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel virtio_net net_failover virtio_scsi serio_raw failover ipmi_devintf ipmi_msghandler fuse
[  136.197713] CR2: 0000000100000008
[  136.201168] ---[ end trace 3fd70f5585fa105d ]---
[  136.205998] RIP: 0010:rb_next+0x21/0x50
[  136.209977] Code: 00 00 c3 66 0f 1f 44 00 00 48 8b 17 48 39 d7 74 35 48 8b 47 08 48 85 c0 74 1c 49 89 c0 48 8b 40 10 48 85 c0 75 f4 4c 89 c0 c3 <48> 3b 78 08 75 f6 48 8b 10 48 89 c7 48 89 d0 48 83 e0 fc 49 89 c0
[  136.229902] RSP: 0018:ffffa9b7c094b5a8 EFLAGS: 00010006
[  136.235245] RAX: 0000000100000000 RBX: ffff8ccd8d7ef158 RCX: 0000000000000000
[  136.242492] RDX: 0000000100000001 RSI: ffff8ccd86a66898 RDI: ffff8ccd86a66898
[  136.249866] RBP: ffff8ccd86a66898 R08: 0000000100000000 R09: 0000000000000000
[  136.257248] R10: 0000000000000015 R11: 0000000000000000 R12: 0000000000000000
[  136.264519] R13: ffff8ccd86a208c0 R14: ffff8ccd86a208c0 R15: ffff8ccd8da21e00
[  136.271779] FS:  0000000000000000(0000) GS:ffff8ccdac000000(0000) knlGS:0000000000000000
[  136.280082] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  136.286084] CR2: 0000000100000008 CR3: 0000000106b5e004 CR4: 00000000001706f0
[  136.293340] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  136.300834] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Full console output: console.txt

dustymabe commented 2 years ago

Saw something similar today in kola-gcp#757 with testing-devel (35.20220120.20.0) and kernel-5.15.10-200.fc35.x86_64:

[   82.862358] BUG: kernel NULL pointer dereference, address: 0000000000000000^M 
[   82.869594] #PF: supervisor read access in kernel mode^M 
[   82.874937] #PF: error_code(0x0000) - not-present page^M 
[   82.880276] PGD 0 P4D 0 ^M 
[   82.882934] Oops: 0000 [#1] SMP PTI^M 
[   82.886542] CPU: 0 PID: 110 Comm: kworker/u2:2 Not tainted 5.15.10-200.fc35.x86_64 #1^M 
[   82.894702] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011^M 
[   82.904124] Workqueue: writeback wb_workfn (flush-8:0)^M 
[   82.909389] RIP: 0010:bfq_put_idle_entity+0x1b/0x60^M 
[   82.914645] Code: 1f 44 00 00 8d 04 bf 01 c0 f7 d8 83 c0 50 c3 0f 1f 44 00 00 55 48 89 fd 53 48 89 f3 e8 2e fa ff ff 48 8b 43 68 48 83 7b 60 00 <48> 8b 00 c6 43 18 00 75 24 48 63 53 4c 48 8d bb 78 ff ff ff 48 29^M 
[   82.933715] RSP: 0000:ffffbb528094b5d0 EFLAGS: 00010082^M 
[   82.939056] RAX: 0000000000000000 RBX: ffff969fccc983d0 RCX: 0000000000000000^M 
[   82.946469] RDX: ffff969fccc98390 RSI: ffff969fccc983b0 RDI: ffff969fccc983d0^M 
[   82.953821] RBP: ffff969fcb6c9958 R08: ffff969fcb6c9960 R09: 0000000000000000^M 
[   82.961332] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000010^M 
[   82.968749] R13: ffff969fc4f5f3b0 R14: ffff969fc4f5f3b0 R15: ffff969fd21af3c0^M 
[   82.975989] FS:  0000000000000000(0000) GS:ffff969fec000000(0000) knlGS:0000000000000000^M 
[   82.984184] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M 
[   82.990120] CR2: 0000000000000000 CR3: 0000000104d1c005 CR4: 00000000001706f0^M 
[   82.997458] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000^M 
[   83.004798] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400^M 
[   83.012040] Call Trace:^M 
[   83.014711]  <TASK>^M 
[   83.016922]  bfq_bfqq_served+0xb0/0x1c0^M 
[   83.020986]  bfq_dispatch_request+0x290/0x1230^M 
[   83.025539]  ? sbitmap_get+0x86/0x190^M 
[   83.029414]  __blk_mq_do_dispatch_sched+0x1d1/0x2e0^M 
[   83.034407]  __blk_mq_sched_dispatch_requests+0xd8/0x130^M 
[   83.039831]  blk_mq_sched_dispatch_requests+0x30/0x60^M 
[   83.044996]  __blk_mq_run_hw_queue+0x2d/0x60^M 
[   83.049384]  __blk_mq_delay_run_hw_queue+0x144/0x150^M 
[   83.054467]  blk_mq_sched_insert_requests+0x63/0xe0^M 
[   83.059476]  blk_mq_flush_plug_list+0xed/0x170^M 
[   83.064027]  blk_mq_submit_bio+0x280/0x580^M 
[   83.068232]  __submit_bio+0x1d3/0x200^M 
[   83.072003]  ? __mod_lruvec_page_state+0x5d/0x90^M 
[   83.076790]  ? unlock_page_memcg+0x18/0x70^M 
[   83.080999]  submit_bio_noacct+0x245/0x280^M 
[   83.085225]  iomap_submit_ioend+0x4e/0x80^M 
[   83.089349]  iomap_do_writepage+0x4db/0x700^M 
[   83.093640]  write_cache_pages+0x176/0x3c0^M 
[   83.097953]  ? iomap_write_begin+0x4e0/0x4e0^M 
[   83.102689]  iomap_writepages+0x1c/0x40^M 
[   83.106826]  xfs_vm_writepages+0x6e/0x90 [xfs]^M 
[   83.111696]  do_writepages+0xca/0x1e0^M 
[   83.115481]  ? kblockd_mod_delayed_work_on+0x17/0x20^M 
[   83.120556]  ? blk_mq_delay_run_hw_queues+0x49/0xd0^M 
[   83.125952]  ? fprop_fraction_percpu+0x2b/0x70^M 
[   83.130619]  ? __wb_calc_thresh+0x2a/0x100^M 
[   83.134827]  __writeback_single_inode+0x39/0x280^M 
[   83.139556]  writeback_sb_inodes+0x1d8/0x440^M 
[   83.143934]  __writeback_inodes_wb+0x4c/0xe0^M 
[   83.148402]  wb_writeback+0x1be/0x260^M 
[   83.152178]  wb_workfn+0x2b4/0x4a0^M 
[   83.155776]  ? check_preempt_curr+0x55/0x70^M 
[   83.160069]  ? ttwu_do_wakeup+0x17/0x150^M 
[   83.164100]  process_one_work+0x1f1/0x390^M 
[   83.168324]  worker_thread+0x53/0x3e0^M 
[   83.172094]  ? process_one_work+0x390/0x390^M 
[   83.176397]  kthread+0x127/0x150^M 
[   83.179820]  ? set_kthread_struct+0x40/0x40^M 
[   83.184113]  ret_from_fork+0x22/0x30^M 
[   83.187802]  </TASK>^M 
[   83.190097] Modules linked in: xt_conntrack iptable_filter xt_MASQUERADE xt_comment iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth bridge stp llc overlay intel_rapl_msr intel_rapl_common kvm_intel kvm snd_pcsp snd_pcm irqbypass rapl snd_timer snd soundcore i2c_piix4 pvpanic_mmio pvpanic drm ip_tables xfs rfkill dm_multipath crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel virtio_net net_failover virtio_scsi serio_raw failover ipmi_devintf ipmi_msghandler fuse^M 
[   83.234465] CR2: 0000000000000000^M 
[   83.237889] ---[ end trace 96e388edab826ab0 ]---^M 
[   83.242613] RIP: 0010:bfq_put_idle_entity+0x1b/0x60^M 
[   83.247864] Code: 1f 44 00 00 8d 04 bf 01 c0 f7 d8 83 c0 50 c3 0f 1f 44 00 00 55 48 89 fd 53 48 89 f3 e8 2e fa ff ff 48 8b 43 68 48 83 7b 60 00 <48> 8b 00 c6 43 18 00 75 24 48 63 53 4c 48 8d bb 78 ff ff ff 48 29^M 
[   83.266916] RSP: 0000:ffffbb528094b5d0 EFLAGS: 00010082^M 
[   83.272334] RAX: 0000000000000000 RBX: ffff969fccc983d0 RCX: 0000000000000000^M 
[   83.279621] RDX: ffff969fccc98390 RSI: ffff969fccc983b0 RDI: ffff969fccc983d0^M 
[   83.286871] RBP: ffff969fcb6c9958 R08: ffff969fcb6c9960 R09: 0000000000000000^M 
[   83.294208] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000010^M 
[   83.301466] R13: ffff969fc4f5f3b0 R14: ffff969fc4f5f3b0 R15: ffff969fd21af3c0^M 
[   83.308790] FS:  0000000000000000(0000) GS:ffff969fec000000(0000) knlGS:0000000000000000^M 
[   83.316985] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M 
[   83.322880] CR2: 0000000000000000 CR3: 0000000104d1c005 CR4: 00000000001706f0^M 
[   83.330208] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000^M 
[   83.337450] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400^M 

Full console output: console.txt

dustymabe commented 2 years ago

Still seeing some ooops on GCP - Is there any way to tell the status of that kernel patchset?

miabbott commented 2 years ago

I think the patchset is still being reviewed; I think I found v1 of the set - https://lore.kernel.org/all/20211223171425.3551-1-jack@suse.cz/

And now it appears to be up to v5 - https://lkml.kernel.org/linux-block/20220121105503.14069-1-jack@suse.cz/

(Not a kernel developer, so navigating patchsets on LKML is not my forte)

dustymabe commented 2 years ago

Noting that I havne't seen this in a while.

dustymabe commented 2 years ago

Saw a kernel oops on GCP today in kola-gcp#46 on kernel-5.16.13-200.fc35.x86_64:

[   80.370395] BUG: kernel NULL pointer dereference, address: 0000000000000000
[   80.377502] #PF: supervisor read access in kernel mode
[   80.382747] #PF: error_code(0x0000) - not-present page
[   80.388040] PGD 0 P4D 0 
[   80.390718] Oops: 0000 [#1] PREEMPT SMP PTI
[   80.395032] CPU: 0 PID: 254 Comm: kworker/u2:3 Not tainted 5.16.13-200.fc35.x86_64 #1
[   80.402967] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[   80.412288] Workqueue: writeback wb_workfn (flush-8:0)
[   80.417716] RIP: 0010:rb_erase+0x106/0x350
[   80.421973] Code: 00 00 48 89 10 48 8b 02 a8 01 0f 84 35 02 00 00 48 83 e0 fc 75 01 c3 48 89 d1 48 89 c2 48 8b 42 08 48 39 c8 75 af 48 8b 42 10 <f6> 00 01 0f 84 1d 01 00 00 48 8b 70 10 48 85 f6 74 05 f6 06 01 74
[   80.441055] RSP: 0018:ffffb755005cb608 EFLAGS: 00010046
[   80.446545] RAX: 0000000000000000 RBX: ffff9adbcbae3158 RCX: 0000000000000000
[   80.453794] RDX: ffff9adbc3473098 RSI: 0000000000000000 RDI: ffff9adbd0f6b898
[   80.461041] RBP: ffff9adbd0f6b898 R08: ffff9adbcbae3160 R09: 0000000000000000
[   80.468409] R10: 0000000000000012 R11: 0000000000000000 R12: ffff9adbd0f6b810
[   80.475847] R13: ffff9adbd27fb180 R14: ffff9adbcd543420 R15: ffff9adbd0f02d00
[   80.483086] FS:  0000000000000000(0000) GS:ffff9adbec000000(0000) knlGS:0000000000000000
[   80.491278] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   80.497130] CR2: 0000000000000000 CR3: 0000000103696004 CR4: 00000000001706f0
[   80.504385] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   80.511709] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   80.518959] Call Trace:
[   80.521613]  <TASK>
[   80.523821]  bfq_idle_extract+0x40/0xb0
[   80.527769]  bfq_put_idle_entity+0x12/0x60
[   80.531989]  bfq_bfqq_served+0xb0/0x1a0
[   80.535949]  bfq_dispatch_request+0x2fa/0x12b0
[   80.540506]  ? sbitmap_get+0x86/0x190
[   80.544276]  __blk_mq_do_dispatch_sched+0x1d1/0x300
[   80.549277]  __blk_mq_sched_dispatch_requests+0xd6/0x130
[   80.554697]  blk_mq_sched_dispatch_requests+0x30/0x60
[   80.559853]  __blk_mq_run_hw_queue+0x30/0xa0
[   80.564244]  __blk_mq_delay_run_hw_queue+0x182/0x1b0
[   80.569316]  blk_mq_sched_insert_requests+0x68/0xf0
[   80.574306]  blk_mq_flush_plug_list+0x196/0x300
[   80.578999]  blk_mq_submit_bio+0x305/0x780
[   80.583204]  submit_bio_noacct+0x2a5/0x2c0
[   80.587425]  iomap_submit_ioend+0x4e/0x80
[   80.591543]  iomap_do_writepage+0x4d4/0x810
[   80.595843]  write_cache_pages+0x156/0x390
[   80.600051]  ? iomap_truncate_page+0x40/0x40
[   80.604430]  iomap_writepages+0x1c/0x40
[   80.608372]  xfs_vm_writepages+0x6c/0x90 [xfs]
[   80.613040]  do_writepages+0xbf/0x1b0
[   80.616811]  ? blkcg_rstat_flush+0x28/0x1f0
[   80.621105]  ? fprop_fraction_percpu+0x2b/0x70
[   80.625754]  ? __wb_calc_thresh+0x2a/0x100
[   80.629976]  __writeback_single_inode+0x3d/0x310
[   80.634702]  writeback_sb_inodes+0x1d4/0x450
[   80.639083]  __writeback_inodes_wb+0x4c/0xe0
[   80.643459]  wb_writeback+0x1c9/0x2a0
[   80.647229]  wb_workfn+0x2c3/0x4e0
[   80.650738]  ? check_preempt_curr+0x55/0x70
[   80.655134]  ? ttwu_do_wakeup+0x17/0x160
[   80.659192]  process_one_work+0x1e8/0x3c0
[   80.663311]  worker_thread+0x50/0x3b0
[   80.667179]  ? rescuer_thread+0x370/0x370
[   80.671311]  kthread+0x16b/0x190
[   80.674650]  ? set_kthread_struct+0x40/0x40
[   80.678939]  ret_from_fork+0x22/0x30
[   80.682631]  </TASK>
[   80.684932] Modules linked in: xt_conntrack iptable_filter xt_MASQUERADE xt_comment iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth bridge stp llc overlay intel_rapl_msr intel_rapl_common kvm_intel kvm snd_pcsp irqbypass snd_pcm rapl snd_timer snd soundcore i2c_piix4 pvpanic_mmio pvpanic drm ip_tables vfat fat xfs rfkill dm_multipath crct10dif_pclmul crc32_pclmul crc32c_intel virtio_net ghash_clmulni_intel net_failover virtio_scsi serio_raw failover ipmi_devintf ipmi_msghandler fuse
[   80.729456] CR2: 0000000000000000
[   80.732877] ---[ end trace 900d3a15c2a8ae78 ]---
[   80.737599] RIP: 0010:rb_erase+0x106/0x350
[   80.741804] Code: 00 00 48 89 10 48 8b 02 a8 01 0f 84 35 02 00 00 48 83 e0 fc 75 01 c3 48 89 d1 48 89 c2 48 8b 42 08 48 39 c8 75 af 48 8b 42 10 <f6> 00 01 0f 84 1d 01 00 00 48 8b 70 10 48 85 f6 74 05 f6 06 01 74
[   80.760767] RSP: 0018:ffffb755005cb608 EFLAGS: 00010046
[   80.766113] RAX: 0000000000000000 RBX: ffff9adbcbae3158 RCX: 0000000000000000
[   80.773439] RDX: ffff9adbc3473098 RSI: 0000000000000000 RDI: ffff9adbd0f6b898
[   80.780677] RBP: ffff9adbd0f6b898 R08: ffff9adbcbae3160 R09: 0000000000000000
[   80.787916] R10: 0000000000000012 R11: 0000000000000000 R12: ffff9adbd0f6b810
[   80.795161] R13: ffff9adbd27fb180 R14: ffff9adbcd543420 R15: ffff9adbd0f02d00
[   80.802413] FS:  0000000000000000(0000) GS:ffff9adbec000000(0000) knlGS:0000000000000000
[   80.810696] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   80.816555] CR2: 0000000000000000 CR3: 0000000103696004 CR4: 00000000001706f0
[   80.823797] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   80.831049] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   80.838289] note: kworker/u2:3[254] exited with preempt_count 2

Full console output: console.txt

dustymabe commented 2 years ago

Saw this again today in kola-gcp#56 and kola-gcp#57.

dustymabe commented 2 years ago

Saw this again today in kola-gcp#100 with 35.20220406.20.0 and kernel-5.16.18-200.fc35.x86_64

Vogtinator commented 2 years ago

FTR, current patch series is https://lore.kernel.org/all/20220401102325.17617-1-jack@suse.cz/

dustymabe commented 1 year ago

We haven't seen this in some time. Maybe fixes landed upstream? Closing..

Vogtinator commented 1 year ago

We haven't seen this in some time. Maybe fixes landed upstream? Closing..

Yep, they're in 5.18+.