QubesOS / qubes-issues

The Qubes OS Project issue tracker
https://www.qubes-os.org/doc/issue-tracking/
534 stars 47 forks source link

System freeze after dom0 update #8088

Closed sidhussmann closed 1 year ago

sidhussmann commented 1 year ago

Qubes OS release

R4.1 fully updated on 2023-03-08

Brief summary

After updating dom0 using the regular Qubes update mechanism, the system constantly freezes after a random amount of time. Sometimes it freezes during boot (before X server is up), other times after 30 minutes. The computer becomes unusable after that.

Steps to reproduce

System: Lenovo Thinkpad X1 Carbon 7th Gen, Intel i7-8565U

Updating dom0 of a stable R4.1 installation or doing a fresh install of Qubes 4.1.2-rc2.

Expected behavior

System does not freeze.

Actual behavior

The system freezes after a random amount of time. Most likely during higher CPU loads.

I suspect a breaking change in the userland/configuration due to an update of dom0.

Other explanations that I considered:

Assumption 1: Breaking Xen/kernel update

Can be ruled out, because I booted into previous configurations via the grub menu. The freezes still happen. I tried the following kernels:

Assumption 2: Memory Failure

Can be ruled out because

  1. Memtest86+ succeeds
  2. A fresh Fedora 37 install on the same computer works for hours with various workloads.

Assumption 3: NVMe drive failure

Can be ruled out because

  1. using an older NVMe disk that had a late 2022 version of Qubes installed worked fine for hours.
  2. after updating dom0 that NVMe disk, the subsequent boots also resulted in system freezes.
  3. SMART scans on the NVMe(s) don't not show any faults.

Further investigations

Things I haven't tried

Questions

marmarek commented 1 year ago

What version of xen-hypervisor package do you have? You may want to try 4.14.5-19 if you have an older one

sidhussmann commented 1 year ago

Thank you @marmarek for your help. I used a spare NVMe drive for my investigations. After installing Fedora and then Qubes R4.1.1 on that spare drive, I can't seem to boot my original NVMe drive anymore. Selecting that NVMe drive from the boot menu immediately returns back to the boot menu. Would [1] help in this case?

[1] https://www.qubes-os.org/doc/uefi-troubleshooting/#boot-device-not-recognized-after-installing

marmarek commented 1 year ago

UEFI keeps entries in boot menu bound to partition UUID - by installing Qubes your "Qubes OS" entry got replaced with one pointing at that spare NVMe. You need to restore the one pointing at the original disk. The command on that page is a bit outdated, for R4.1 it would be:

 efibootmgr -v -c -u -L "Qubes OS" -l /EFI/qubes/grubx64.efi -d /dev/nvme0n1 -p 1

If you want to use different disk from time to time, adjust the label to something else.

sidhussmann commented 1 year ago

Thank you, Marek, for your explanation. That worked like a charm. I can also confirm that with xen-hypervisor version 4.14.5-19, I have not experienced any freezes. For reference, the problem appeared on my machine with version 4.14.5-18.

gusgustavo commented 1 year ago

I just had this issue today with xen-hypervisor version 4.14.5-18. I enabled the dom0 testing repo and installed xen-hypervisor version 4.14.5-19.

loneicewolf commented 1 year ago

thank you 2 @gusgustavo and @sidhussmann for posting this, I'll try this as well. as I am having issues with mouse/keyboard stuttering myself. (latest, 4 1 2)

andrewdavidwong commented 1 year ago

Looks like this turned out to be just a documentation issue. Is that correct? If so, we should probably keep it open until the documentation is updated.

andrewdavidwong commented 1 year ago

Oh, wait. It looks like the main bug was actually fixed by xen-hypervisor-4.14.5-19, so maybe not just a documentation issue after all. To be honest, it's not clear to me why the outdated UEFI troubleshooting documentation entered into this issue. Looks like maybe an off-topic question was asked and answered that should've been a separate issue. I guess I'll re-close this as resolved, then.

If anyone believes this issue is not yet resolved, or if anyone is still affected by this issue, please leave a comment, and we'll be happy to reopen it. Thank you.

sidhussmann commented 1 year ago

Oh, wait. It looks like the main bug was actually fixed by xen-hypervisor-4.14.5-19

@andrewdavidwong that is correct

To be honest, it's not clear to me why the outdated UEFI troubleshooting documentation entered into this issue. Looks like maybe an off-topic question was asked and answered that should've been a separate issue.

FYI The outdated documentation topic entered in this issue comment https://github.com/QubesOS/qubes-issues/issues/8088#issuecomment-1465074696