QubesOS / qubes-issues

The Qubes OS Project issue tracker
https://www.qubes-os.org/doc/issue-tracking/
534 stars 46 forks source link

Occasionally unable to legally shutdown - kernel is soft locked up #7696

Open logoerthiner1 opened 2 years ago

logoerthiner1 commented 2 years ago

How to file a helpful issue

Qubes OS release

Qubes OS R4.1, dom0 kernel 5.15.52

Brief summary

Aug 15 13:57:56 dom0 kernel: watchdog: BUG: soft lockup - CPU#5 stuck for 1889s! [kworker/5:3:39461]
Aug 15 13:57:28 dom0 kernel:  </TASK>
Aug 15 13:57:28 dom0 kernel:  ret_from_fork+0x1f/0x30
Aug 15 13:57:28 dom0 kernel:  ? set_kthread_struct+0x40/0x40
Aug 15 13:57:28 dom0 kernel:  kthread+0x124/0x150
Aug 15 13:57:28 dom0 kernel:  ? process_one_work+0x390/0x390
Aug 15 13:57:28 dom0 kernel:  worker_thread+0x4c/0x310
Aug 15 13:57:28 dom0 kernel:  process_one_work+0x1ee/0x390
Aug 15 13:57:28 dom0 kernel:  bpf_jit_free+0x5c/0xa0
Aug 15 13:57:28 dom0 kernel:  __vunmap+0x15c/0x290
Aug 15 13:57:28 dom0 kernel:  _vm_unmap_aliases.part.0+0x110/0x140
Aug 15 13:57:28 dom0 kernel:  ? purge_fragmented_blocks+0xc5/0x200
Aug 15 13:57:28 dom0 kernel:  __purge_vmap_area_lazy+0xd4/0x700
Aug 15 13:57:28 dom0 kernel:  on_each_cpu_cond_mask+0x1e/0x20
Aug 15 13:57:28 dom0 kernel:  ? __flush_tlb_all+0x30/0x30
Aug 15 13:57:28 dom0 kernel:  <TASK>
Aug 15 13:57:28 dom0 kernel: Call Trace:
Aug 15 13:57:28 dom0 kernel: CR2: 0000616128d334c0 CR3: 0000000101fbe000 CR4: 0000000000050660
Aug 15 13:57:28 dom0 kernel: CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 15 13:57:28 dom0 kernel: FS:  0000000000000000(0000) GS:ffff888175340000(0000) knlGS:0000000000000000
Aug 15 13:57:28 dom0 kernel: R13: 0000000000000000 R14: 0000000000000008 R15: ffff888175372400
Aug 15 13:57:28 dom0 kernel: R10: 0000000000000004 R11: 0000000000000000 R12: 0000000000000208
Aug 15 13:57:28 dom0 kernel: RBP: ffff888175372400 R08: 0000000000000000 R09: 0000000000000000
Aug 15 13:57:28 dom0 kernel: RDX: ffffe8ffffb2e680 RSI: 0000000000000000 RDI: 0000000000000004
Aug 15 13:57:28 dom0 kernel: RAX: 0000000000000011 RBX: 0000000000000001 RCX: 0000000000000004
Aug 15 13:57:28 dom0 kernel: RSP: e02b:ffffc9004064fd28 EFLAGS: 00000202
Aug 15 13:57:28 dom0 kernel: Code: 8b 75 08 e8 ae c9 58 00 3b 05 5c e3 9b 01 89 c7 73 21 48 63 c7 48 8b 55 00 48 03 14 c5 c0 ba 6e 82 8b 42 08 a8 01 74 09 f3 90 <8b> 42 08 a8 01 75 f7 eb cc 48 83 c4 38 5b 5d 41 5c 41 5d 41 5e 41
Aug 15 13:57:28 dom0 kernel: RIP: e030:smp_call_function_many_cond+0x114/0x2c0
Aug 15 13:57:28 dom0 kernel: Workqueue: events bpf_prog_free_deferred
Aug 15 13:57:28 dom0 kernel: Hardware name: LENOVO 20X4A00BCD/20X4A00BCD, BIOS R1JET55W (1.55 ) 02/28/2022
Aug 15 13:57:28 dom0 kernel: CPU: 5 PID: 39461 Comm: kworker/5:3 Tainted: G        W    L    5.15.52-1.fc32.qubes.x86_64 #1
Aug 15 13:57:28 dom0 kernel:  processor_thermal_mbox processor_thermal_rapl intel_rapl_common idma64 snd_timer snd mei_me mei intel_hid int3403_thermal soundcore int340x_thermal_zone wmi thunderbolt i2c_i801 int3400_thermal intel_soc_dt>
Aug 15 13:57:28 dom0 kernel: Modules linked in: loop vfat fat snd_ctl_led snd_soc_skl_hda_dsp snd_soc_intel_hda_dsp_common snd_soc_hdac_hdmi iTCO_wdt snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_soc_dmic snd_sof_pc>

Seems to be related to unmapping of various resources.

Steps to reproduce

Unsure what is related in order to reproduce this.

Things related:

  1. I have built a new template and cloned it onto a thin pool bound to my SSD.

  2. I had just experienced a #7602 when I shutdown.

  3. I had shutdown every appvm before shutdown the dom0.

Expected behavior

No kernel soft lockup happen in shutdown

Actual behavior

Kernel soft lockup happen in shutdown.

DemiMarie commented 1 year ago

That is a rather weird backtrace!

logoerthiner1 commented 1 year ago

Most likely related to #7340 (as the issue turns out to be a xen timer misoperation causing kernel soft lockup), and this issue should be fixed now hopefully.

Maybe keep it later and see whether similar issues occur again.