QubesOS / qubes-issues

The Qubes OS Project issue tracker
https://www.qubes-os.org/doc/issue-tracking/
526 stars 46 forks source link

Recent changes cause suspend to fail #9124

Open BetoHydroxyButyrate opened 2 months ago

BetoHydroxyButyrate commented 2 months ago

How to file a helpful issue

Qubes OS release

4.2

Brief summary

Rebooted recently. Subsequently, systemctl suspend fails to suspend.

Steps to reproduce

systemctl suspend

Expected behavior

System enters suspended state. I do it all the time, several times per day.

Actual behavior

System does not suspend. Lots of errors in the dmesg:

[  185.026824] pvqspinlock: lock 0xffff888100c52a80 has corrupted value 0x0!
[  185.026841] WARNING: CPU: 1 PID: 654 at kernel/locking/qspinlock_paravirt.h:508 __pv_queued_spin_unlock_slowpath+0xc2/0xd0
[  185.026866] Modules linked in: vfat snd_hda_codec_hdmi snd_sof_pci_intel_tgl fat snd_sof_intel_hda_common soundwire_intel snd_sof_intel_hda_mlink soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_soc_hdac_hda snd_hda_ext_core snd_hda_codec_realtek snd_soc_acpi_intel_match intel_rapl_msr snd_soc_acpi snd_hda_codec_generic soundwire_generic_allocation ledtrig_audio intel_rapl_common soundwire_bus snd_soc_core snd_compress ac97_bus snd_pcm_dmaengine snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi intel_uncore_frequency_common snd_hda_codec ee1004 snd_hda_core snd_hwdep mei_pxp mei_hdcp snd_seq pmt_telemetry pmt_class snd_seq_device snd_pcm snd_timer joydev snd i2c_i801 iwlwifi wmi_bmof pcspkr i2c_smbus soundcore igc ov13858 mei_me cfg80211 mei v4l2_fwnode thunderbolt idma64 rfkill v4l2_async intel_vsec videodev mc intel_pmc_core acpi_tad loop fuse xenfs dm_crypt crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic i915 nvme ghash_clmulni_intel
[  185.026991]  i2c_algo_bit sha512_ssse3 drm_buddy sha256_ssse3 nvme_core sha1_ssse3 xhci_pci ttm xhci_pci_renesas wdat_wdt drm_display_helper xhci_hcd ucsi_acpi cec typec_ucsi nvme_common typec video wmi pinctrl_tigerlake xen_acpi_processor xen_privcmd xen_pciback xen_blkback xen_gntalloc xen_gntdev xen_evtchn scsi_dh_rdac scsi_dh_emc scsi_dh_alua uinput dm_multipath
[  185.027048] CPU: 1 PID: 654 Comm: kworker/1:3 Not tainted 6.6.25-1.qubes.fc37.x86_64 #1
[  185.027055] Hardware name: Intel(R) Client Systems NUC13ANHi5/NUC13ANBi5, BIOS ANRPL357.0027.2023.0607.1754 06/07/2023
[  185.027059] Workqueue: events ata_scsi_dev_rescan
[  185.027073] RIP: e030:__pv_queued_spin_unlock_slowpath+0xc2/0xd0
[  185.027085] Code: d4 25 04 ff 90 c3 cc cc cc cc 8b 05 cc 7f f3 00 85 c0 74 05 c3 cc cc cc cc 8b 17 48 89 fe 48 c7 c7 18 a3 b1 81 e8 8e 9e 12 ff <0f> 0b c3 cc cc cc cc 0f 0b 0f 1f 44 00 00 90 90 90 90 90 90 90 90
[  185.027091] RSP: e02b:ffffc90040757db8 EFLAGS: 00010286
[  185.027096] RAX: 0000000000000000 RBX: ffff888100cea000 RCX: 0000000000000027
[  185.027100] RDX: ffff8881b3661588 RSI: 0000000000000001 RDI: ffff8881b3661580
[  185.027104] RBP: ffff8881086a3d30 R08: ffffffff81e66200 R09: 0000000000ffff10
[  185.027107] R10: 0000000000000000 R11: 000000000000000f R12: 0000000000000200
[  185.027111] R13: ffff8881086a2040 R14: 0000000000400000 R15: ffff8881086a24c0
[  185.027133] FS:  0000000000000000(0000) GS:ffff8881b3640000(0000) knlGS:0000000000000000
[  185.027138] CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[  185.027142] CR2: 00007fc8ce63d36e CR3: 000000016fb26000 CR4: 0000000000050660
[  185.027154] Call Trace:
[  185.027160]  <TASK>
[  185.027162]  ? __pv_queued_spin_unlock_slowpath+0xc2/0xd0
[  185.027173]  ? __warn+0x81/0x130
[  185.027185]  ? __pv_queued_spin_unlock_slowpath+0xc2/0xd0
[  185.027195]  ? report_bug+0x171/0x1a0
[  185.027204]  ? _raw_spin_unlock_irqrestore+0xe/0x40
[  185.027213]  ? __up_console_sem.constprop.0+0x35/0x40
[  185.027224]  ? handle_bug+0x41/0x70
[  185.027231]  ? exc_invalid_op+0x17/0x70
[  185.027237]  ? asm_exc_invalid_op+0x1a/0x20
[  185.027249]  ? __pv_queued_spin_unlock_slowpath+0xc2/0xd0
[  185.027259]  ? __pv_queued_spin_unlock_slowpath+0xc2/0xd0
[  185.027268]  __raw_callee_save___pv_queued_spin_unlock_slowpath+0x15/0x30
[  185.027280]  .slowpath+0x9/0x16
[  185.027289]  _raw_spin_unlock_irqrestore+0xe/0x40
[  185.027297]  ata_scsi_dev_rescan+0x162/0x1a0
[  185.027306]  process_one_work+0x171/0x340
[  185.027318]  worker_thread+0x27b/0x3a0
[  185.027325]  ? __pfx_worker_thread+0x10/0x10
[  185.027329]  kthread+0xe5/0x120
[  185.027338]  ? __pfx_kthread+0x10/0x10
[  185.027345]  ret_from_fork+0x31/0x50
[  185.027353]  ? __pfx_kthread+0x10/0x10
[  185.027360]  ret_from_fork_asm+0x1b/0x30
[  185.027369]  </TASK>
[  185.027371] ---[ end trace 0000000000000000 ]---

dmesg attached. dmesg.txt

BetoHydroxyButyrate commented 2 months ago

dom0 update was last checked today, last updated on 2024-04-16.

I rebooted yesterday, or perhaps today, because I stuffed up and did a qvm-pause --all and lost my keyboard. Prior that, suspend/resume worked. Post reboot, not.

UndeadDevel commented 2 months ago

Some info about your hardware might help, as I'm running the 6.6.25 kernel on a fully up to date Qubes 4.2 also and suspend works on my NV41.

You could also try booting into an earlier kernel to see if that fixes it (diagnostically).

marmarek commented 2 months ago

[ 185.027055] Hardware name: Intel(R) Client Systems NUC13ANHi5/NUC13ANBi5, BIOS ANRPL357.0027.2023.0607.1754 06/07/2023

Some info is already here. I might have NUC13 somewhere to check.

BetoHydroxyButyrate commented 2 months ago

Some info about your hardware might help, as I'm running the 6.6.25 kernel on a fully up to date Qubes 4.2 also and suspend works on my NV41.

You could also try booting into an earlier kernel to see if that fixes it (diagnostically).

As @marmarek mentioned, I did intentionally include the dmesg.txt noting (not to anyone but myself) that it had additional debug info and in particular the hardware info. Is there anything else you need?

Just booted into 6.1.75-1. Suspend/resume works again. See dmesg.log.

dmesg.log

BetoHydroxyButyrate commented 2 months ago

[ 185.027055] Hardware name: Intel(R) Client Systems NUC13ANHi5/NUC13ANBi5, BIOS ANRPL357.0027.2023.0607.1754 06/07/2023

Some info is already here. I might have NUC13 somewhere to check.

If you find a Nuc13, can you see if systemctl reboot works? Mine shutdowns, but the final reboot/reset does not, and I have to hold the power button in for 10 seconds or so to force poweroff. systemctl poweroff works.