Closed stevegwh closed 4 years ago
Hey again,
I 'more or less' have got this working. If I blacklist the nvidia modules from loading then I can enable them and remove them all I want when logged in and virt-manager automatically will unbind and rebind them to nvidia. I haven't got the nvidia prime render offloading thing working when I load the modules dynamically unfortunately but nvidia-xrun works great.
I don't know if this is of interest to you but I thought I'd drop a comment anyway :).
For example, you mention you use bumblebee to access your secondary card on the host OS.
My guide does not make use of bumblebee or prime offloading.
Any attempt I make of dynamically unbinding/binding the GPU seems to result in the nvidia driver being unbound ( lspci -nnk -d 10de:1e84 shows no driver is being used) but the scripts seem to hang and never finish (I assume meaning the vfio driver fails to bind the card). Do you have any idea what could be the issue? Any 'gotcha'?
Make sure that there are no processes running on your GPU by running nvidia-smi
. It should look something like this:
Fri Apr 24 20:44:32 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.64 Driver Version: 440.64 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 208... Off | 00000000:0D:00.0 Off | N/A |
| 41% 38C P8 13W / 260W | 0MiB / 11019MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
The nvidia driver is bound to the card on boot.. I think perhaps this is where I'm going wrong.
I don't think this is where you're going wrong... There are many setups out there (including mine) which make use of the nvidia drivers loading at boot-time. This should not affect your ability to bind/unbind the drivers so long as your nvidia GPU isn't being used as the primary gpu (i.e. X server display session). You can go ahead and blacklist the nvidia drivers at boot-time if you'd like but then you'll have to run a script that loads the nvidia driver (modprobe
) and attaches the driver to your card every time you want to use it on your host.
Hey, thanks for replying.
Since the last time I posted I figured a few more things out. Basically, I can boot with nvidia drivers and then stop the display manager, unbind nvidia/bind to vfio and all works perfectly (though requires killing the DM which is less than ideal). Doing as you've said and looking at nvidia-smi it confirms that nvidia has Xorg as a process, argh. Do you know how I could stop this? I have no xorg conf file so I assume it's going off 'reasonable defaults'. I'm definitely rendering everything with the AMD card so the nvidia card shouldn't be doing anything with Xorg...
Thanks again
Argh, so close to this working! So by specifying my AMD gpu specifically in xorg.conf.d the nvidia driver doesn't bind to Xorg and loads. It can be unloaded and loaded just fine... but... prime offloading doesn't work. I think this is because prime needs a 'GPU screen' or something along these lines (it complains it can't create a context). I'll try to figure it out tomorrow. Thanks again for the help.
Ok I lied, I didn't go to bed. Basically prime render offloading seems to work by having a 'GPU screen' and a regular X screen for your second graphics card.. that's why it lists Xorg as a process for nvidia when I boot without blacklisting. This means I can easily stop ssdm, change driver, restart sddm and everything works fine. If I could somehow add/remove a provider while X is running that would be perfect but yeah.. that will be for tomorrow (for real this time).
Basically prime render offloading seems to work by having a 'GPU screen' and a regular X screen for your second graphics card.. that's why it lists Xorg as a process for nvidia when I boot without blacklisting. This means I can easily stop ssdm, change driver, restart sddm and everything works fine.
I think you're stuck with having to start and stop ssdm... However, I totally understand why you want to overcome this problem. Might I ask why you're trying to achieve prime offloading instead of just dedicating GPU-intensive work to the nvidia card from the host? I've heard of prime offloading working with an intel iGPU + Nvidia card, but not in a hybrid AMD/Nvidia setup.
Basically prime render offloading seems to work by having a 'GPU screen' and a regular X screen for your second graphics card.. that's why it lists Xorg as a process for nvidia when I boot without blacklisting. This means I can easily stop ssdm, change driver, restart sddm and everything works fine.
I think you're stuck with having to start and stop ssdm... However, I totally understand why you want to overcome this problem. Might I ask why you're trying to achieve prime offloading instead of just dedicating GPU-intensive work to the nvidia card from the host? I've heard of prime offloading working with an intel iGPU + Nvidia card, but not in a hybrid AMD/Nvidia setup.
Yeah, it seems highly likely that the best solution is just restarting Xorg (it's still pretty painless). As for why I want to do this.. I mainly only need a Windows VM for VR development (my job/studies revolve around that) and Linux support with Oculus among others is non-existent. Also any games that Linux can't run I play in the VM... however, if the game can run with wine/proton then I generally prefer to do that but I would want to use my much more powerful card.
Regarding the intel iGPU comment.. all the documentation refers to using an iGPU and Nvidia graphics but it works absolutely perfectly with an AMD card acting as the secondary card. I can prefix any commands with 'prime-run' and get the same performance as I would running it alone.
My only other suggestion would be to check out Looking Glass. I suggest this because it seems you don't want the hassle of having to switch to the Windows VM when you could instead play certain games on your Linux host through Lutris. What Looking Glass does is essentially run a low-latency KVM FrameRelay that allows you to render the gaming portion of your VM inside your host (sort of like a headless display). You obviously wouldn't be using Lutris anymore but I think that's honestly a good thing given the limited support it has.
Unfortunately, achieving prime offloading without having to do an x-restart for a KVM is beyond the scope of this tutorial... I wish you luck in your endeavor! Feel free to reach out to me over email (for further questions or to report your results) but I'll be closing this issue for now.
Hijacking this thread: Trying to get a very similar setup running with Pop!OS2004. Also for me virt-manager stalls when creating the VM with virt-manager. Tried running the prepare/begin
script and the virsh nodedev-detach
command both stalling and after killing them even a nvidia-smi
stalls. Any ideas?
Running graphics on a dedicated A2500 KVM session with seperate IOMMU group. No processes accessing the GPU.
trying really hard to find the documentation, but it was something along the lines of amdgpu/nvidiagpu were being loaded before vfio-pci. After changing the order of loaded drivers, it was able to bind/unbind at vm start/stop.
maybe this could help: My /etc/initramfs-tools/modules (from link to reddit comment)
softdep amdgpu pre: vfio vfio_pci
softdep ahci pre: vfio vfio_pci
softdep xhci_hcd: vfio vfio_pci
vfio
vfio_iommu_type1
vfio_virqfd
vfio_pci
amdgpu
ahci
xhci_hcd
Hey all, I'm stuck in this too - I have an AMD HD 7750 in secondary PCIe slot meant to be used as the primary graphics and the GTX 1070 in the main slot. I can't seem to unbind the 1070, and basically the VM hangs on creating a domain. The OS treats the Nvidia GPU as the primary GPU and most processes are basically running on that. I was able to switch the system76-power graphics to hybrid, which made most of the processes stop showing up on nvidia-smi (other than Xorg).
This however still does not allow me to unbind the GPU, and worse, i cant use the nvidia GPU on the host
Yes, it's the same for me. (Debian 12, Gnome 43xx under Wayland on a Lenovo Legion 5 Pro, 16ACH6A, Radeon iGPU / Nvidia RTX 3070 dGPU, no xorg.conf) I've been very excited discovering this tutorial but had a hard landing. The issue is even when system76-power puts the system into the compute-mode (after a reboot.. already a downer), lsof /dev/nvidia* is showing that the processes 'gnome-she', Xwayland, firefox-b and gnome-tex are still using the GPU or its modules (although nvidia-smi does not show a process..). All the excitement kind of degraded to having a script, triggered by a grub boot-entry which alters or installs a xorg.conf.. :( (Close to just boot Windows..)
Hey there, I was wondering if you could help me at all!
I'll just get the boring details out there way:
Distro: Arch CPU: Ryzen 5 3600 Primary GPU: Radeon RX 560 Secondary GPU for pass through: Geforce RTX 2070 Super RAM: 16gb Corsair Vengeance 3000-cl15 Nvidia 440 drivers
So the last few days I have gotten the pass through working perfectly by reserving the 2070s for vfio by passing in the bus ids as kernel parameters in grub and in the modules part of mkinitcpio.
This works fine but has the obvious drawback that I am unable to use my RTX card on the host OS. I came across your guide as you have quite a similar setup to mine (yours obviously a bit better :^)) and you mention using hooks to bind and unbind the graphics cards. Unfortunately, this doesn't seem to work for me and I have a few questions regarding this..
Any attempt I make of dynamically unbinding/binding the GPU seems to result in the nvidia driver being unbound ( lspci -nnk -d 10de:1e84 shows no driver is being used) but the scripts seem to hang and never finish (I assume meaning the vfio driver fails to bind the card). Do you have any idea what could be the issue? Any 'gotcha'? I have nothing related to nvidia or vfio in mkinitcpio or the grub config.
The nvidia driver is bound to the card on boot.. I think perhaps this is where I'm going wrong. For example, you mention you use bumblebee to access your secondary card on the host OS. To use my 2070s I use prime-run and Nvidia render offloading. Is there something that bumblebee does (like blacklisting a driver) that would make your VM hooks work and mine not? I have read somewhere that it blacklists nvidia_drm which I am currently using for the render offload, I wondering if you know if I should be blacklisting the nvidia driver on boot and then using something like bumblebee to activate it to play games using the card?
Sorry for so many questions, I'm trying to direct my questioning a bit to take any burden off you replying! ha I genuinely have no idea what to try to get this to work and there seems be no log or error that could show me what is failing in the background for me to troubleshoot...
Regards, Steve (Thanks for the guide btw)
EDIT
I'm now convinced it's because I was loading the nvidia driver on boot... I'm going to give bumblebee/nvidia x-run a go and have a play around with that :)