Open AnErrupTion opened 3 months ago
I see the split lock detection triggers in your dmesg log. That will cause issues for the VM, up to the point where it may not make any progress. I am not sure whether that is the root cause of your issue, but please try the recommendation from the README and see if it helps:
Starting with Intel Tiger Lake (11th Gen Core processors) or newer, split lock detection must be turned off in the host system. This can be achieved using the Linux kernel command line parameter
split_lock_detect=off
or using thesplit_lock_mitigate
sysctl.
I see the split lock detection triggers in your dmesg log. That will cause issues for the VM, up to the point where it may not make any progress. I am not sure whether that is the root cause of your issue, but please try the recommendation from the README and see if it helps:
Starting with Intel Tiger Lake (11th Gen Core processors) or newer, split lock detection must be turned off in the host system. This can be achieved using the Linux kernel command line parameter
split_lock_detect=off
or using thesplit_lock_mitigate
sysctl.
I was pretty sure I had already disabled it. But, either way, adding the command line parameter didn't do anything, although I now see this in dmesg
:
Unknown kernel command line parameters "split_lock_detect=off", will be passed to user space.
But I also see x86/split lock detection: disabled
earlier in the log, so I'm assuming it's actually disabled now.
@snue is correct.
Here we have it
[ 2109.050169] x86/split lock detection: #AC: EMT-0/4675 took a split_lock trap at address: 0xfffff8021f251f4f
Unknown kernel command line parameters "split_lock_detect=off", will be passed to user space.
Yes, this is expected.
But I also see x86/split lock detection: disabled earlier in the log, so I'm assuming it's actually disabled now.
Sounds about right. Did it solve your issue?
@snue is correct.
Here we have it
[ 2109.050169] x86/split lock detection: #AC: EMT-0/4675 took a split_lock trap at address: 0xfffff8021f251f4f
Unknown kernel command line parameters "split_lock_detect=off", will be passed to user space.
Yes, this is expected.
But I also see x86/split lock detection: disabled earlier in the log, so I'm assuming it's actually disabled now.
Sounds about right. Did it solve your issue?
Unfortunately, it didn't solve the issue.
@AnErrupTion can you post new logs with split lock disabled?
Ah yes, my bad. Here they are:
It looks a little bit better and the guest is definitively trying to use the GPU:
00:00:07.099476 VFIO: RegisterBar 0xf0000000
00:00:07.099500 VFIO: RegisterBar 0x800000000
00:00:07.099501 VFIO: RegisterBar 0x900000000
00:00:07.099503 VFIO: RegisterBar 0x6000
00:00:07.099809 VFIO: Activate MSI count: 1
and
[ 43.766761] vfio-pci 0000:01:00.0: enabling device (0000 -> 0003)
I assume this card needs some kind of quirk. I can maybe look into this in a couple of weeks.
Can you upload the output of lspci -vvvn
please?
I assume this card needs some kind of quirk. I can maybe look into this in a couple of weeks.
I'm not sure if it does, since passing through the same GPU with QEMU works just fine (no additional quirks needed or shenanigans).
Can you upload the output of lspci -vvvn please?
Alright, here's the output (when ran as root): lspci.log
I'm not sure if it does, since passing through the same GPU with QEMU works just fine (no additional quirks needed or shenanigans).
Qemu automatically applies the necessary quirks when it detects a card that needs them
I'm not sure if it does, since passing through the same GPU with QEMU works just fine (no additional quirks needed or shenanigans).
Qemu automatically applies the necessary quirks when it detects a card that needs them
Is there a way of knowing which ones does it apply? I can fire up a QEMU VM if needed.
Also, I guess I forgot to mention one interesting bit: when I went to check for updates in the VM, Windows Update did not download the NVIDIA driver and I had to download it manually (but then it installed fine afterwards). And, when I went to Device Manager, it said that the driver used is not the same one as the POSTed graphics driver, or something like this. None of this happened with QEMU either.
There are quite some nvidia quirks in QEMU. The quirky MSI handling is an obvious suspect, but so is the mirrored config space access in general. See this background discussion: https://patchwork.kernel.org/project/qemu-devel/patch/20180129202326.9417.71344.stgit@gimli.home/
Just maybe, you can force the GPU into legacy interrupt mode instead of MSI in the Windows VM to try and work around that?
There are quite some nvidia quirks in QEMU. The quirky MSI handling is an obvious suspect, but so is the mirrored config space access in general. See this background discussion: https://patchwork.kernel.org/project/qemu-devel/patch/20180129202326.9417.71344.stgit@gimli.home/
Just maybe, you can force the GPU into legacy interrupt mode instead of MSI in the Windows VM to try and work around that?
I have tried to disable MSI by setting MSISupported
in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Enum\PCI\VEN_10DE&DEV_25A2&SUBSYS_13FC1043&REV_A1\3&267a616a&0&80\Device Parameters\Interrupt Management\MessageSignaledInterruptProperties
to 0 instead of 1, but unfortunately, the problem still persists. One interesting thing though is that, in the utility I was using (MSI mode utility v3.1), my GPU doesn't actually appear on the list of devices, even though it's present in the registry and it also supports MSI (though that last part shouldn't matter because devices that don't support MSI also appear in the program's list):
Bug Description
When following the guide over here, adapting it to passthrough a dedicated GPU, a code 43 error can be observed after installing the GPU drivers in the guest system using Device Manager.
How to Reproduce
vfio-pci
is correctly bound to the GPU/etc/security/limits.conf
/dev/vfio/*
--attachvfio
VM configuration
Guest OS configuration details:
VirtualBox VMs/<guest VM name>/<guest VM name>.vbox
: Windows 11.vbox.zipHost OS details:
Linux shininglea 6.10.4-arch2-1 #1 SMP PREEMPT_DYNAMIC Sun, 11 Aug 2024 16:19:06 +0000 x86_64 GNU/Linux
Logs