Open yojoe opened 1 year ago
Well, apparently it is loaded too late and there is some race condition with the kernel module loading the GPU driver.
If one checks /usr/lib/dracut/modules.d/
, one will see that both 90kernel-modules
and 90qubes-pciback
exist. If the numbering has any relevance, the race condition is no surprise.
Anyway this is pretty bad indeed wrt security as VM devices shouldn't get access to dom0.
A bit related: #7886
On my side, rd.qubes.hide_pci
work as expected (R4.1 and development tree). Didn't see a difference with R4.0.
@yojoe can you still reproduce this?
this is happening to me now, after repairing my grub from a period where I could not boot
Interesting.
I was able to fix it with modprobe unload nouveau for what it's worth
@OwOday what does lspci
in dom0 show?
VGA compatible controller: NVIDIA Corporation AD102 [Geforce RTX 4090]
Qubes OS release
4.1
Brief summary
Hiding secondary GPU (AMD RX 580) from
dom0
via Grub Command Line does not work anymore in Qubes 4.1. It was working on the same system with Qubes 4.0 previously.Steps to reproduce
Set
/etc/default/grub
to hide the AMD Radeon RX 580 VGA and Audio devices fromdom0
and regenerategrub.cfg
.Verify after reboot via
cat /proc/cmdline
it's there and has no typos.Expected behavior
After reboot the following two PCI devices should not be visible to
dom0
andlspci
shouldn't enumerate them anymore:Actual behavior
After reboot,
lspci
indom0
still enumerates the two PCI devices. Also theamdgpu
kernel module is loaded (shown inlsmod
) and bound to the secondary GPU. Although there's no display connected to the secondary GPU and it's "idle" I can hear the fans of the RX 580. If I then try a GPU passthrough of the RX 580 to aHVM domU
the domU tries to initialize the RX 580, the fans stop spinning and with a delay of about 10 secondsdom0
crashes/freezes because it has an activeamdgpu
module that is still bound to the VGA device of the RX 580. AFAIK this is kind of expected thatdom0
crashes if you try a PCI passthrough of a device that is still bound todom0
.However, if I blacklist the
amdgpu
module from dom0 via/etc/modprobe.d/
the passthrough todomU
works, although the RX 580 PCI devices are still visible todom0
. I thought that maybe amdgpu grabs the VGA device beforedracut
runs the90qubes-pciback/qubes-pciback.sh
script which does the evaluation of therd.qubes.hide_pci
Grub command line argument. But this doesn't seem to be the root cause why the hiding doesn't work. Anyway, blacklistingamdgpu
fixes the symptom of passthrough not working, but doesn't fix the proper hiding fromdom0
.dmesg -k | grep "01:00.0" -B10 -A5
doesn't show any obvious errors regarding pciback hiding:I tried with multiple different kernel versions in Qubes 4.1 from the
kernel
andkernel-latest
packages and even the old5.4
leftover from the previous Qubes 4.0 install before the upgrade to 4.1. But this doesn't make a difference, hiding the RX 580 fromdom0
doesn't work with any of these kernel version under 4.1, but was working on 4.0.Seems like I'm not the only one with this issue/bug: https://forum.qubes-os.org/t/gpu-passthrough-again/14019