QubesOS / qubes-issues

The Qubes OS Project issue tracker
https://www.qubes-os.org/doc/issue-tracking/
541 stars 48 forks source link

[Build 202407021159-4.3] openQA test fails in update2 on Novacustom NV41 #9335

Open marmarek opened 4 months ago

marmarek commented 4 months ago

Observation

openQA test in scenario qubesos-4.3-kernel-x86_64-system_tests_update@hw10 fails in update2

sys-net can't connect to the network after updating dom0 and restarting sys-net. Looking inside, the link is down.

The test fails repeatedly on this machine. But manually starting the system and restarting sys-net does not reproduce the problem.

Reproducible

Fails since (at least) Build 202407021159-4.3

Expected result

Last good: 202405281715-4.3 (or more recent)

Further details

Always latest result in this scenario: latest

This fails on a kernel-latest test, but sys-net is still using 6.6.31 at this point.

marmarek commented 3 months ago

I tried to diagnose this again, and still no idea. I've compared PCI config space between dom0 and sys-net, and all differences look legit. Reloading r8169 module in sys-net fixes the network. And there is no difference in config space between broken and working state. I also don't see any suspicious messages from the driver itself. And still cannot reproduce the issue outside of this one openQA test. The test does:

  1. Start the freshly installed system
  2. Install dom0 updates (this includes updating VM's kernel from 6.6.36 to 6.6.42; the issue happened also when updating from 6.6.31 to 6.6.36). Template updates are not installed at this point yet. No Xen update was involved (at least in some failures of this type).
  3. Execute qvm-shutdown --force --wait sys-net sys-firewall sys-whonix && sleep 2m && qvm-start sys-firewall sys-whonix

I tried reproducing it on the system after those updates, tried switching back to 6.6.36 kernel in sys-net, switching to 6.6.42 after restart, switching dom0 kernel to either of those versions. In all cases, when I did it manually it worked.