nodedev-detach hangs - Githubissues

(creating a new issue with the same body as my reply to the old one) Originally posted by @Moonlight63 in https://github.com/bryansteiner/gpu-passthrough-tutorial/issues/16#issuecomment-843735151

Hello, Thank you for the guide. I have been running passthrough for a while, but just did a fresh install of Pop and thought it might be nice to be able to use my second gpu when not running VMs. I am having the same issues as others here.

Running virsh nodedev-detach $VIRSH_GPU_VIDEO where, for me

VIRSH_GPU_VIDEO=pci_0000_02_00_0
VIRSH_GPU_AUDIO=pci_0000_02_00_1

causes a hang.

My gpus are on there own IOMMU groups

IOMMU Group 34 02:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP104 [GeForce GTX 1070] [10de:1b81] (rev a1)
IOMMU Group 34 02:00.1 Audio device [0403]: NVIDIA Corporation GP104 High Definition Audio Controller [10de:10f0] (rev a1)
IOMMU Group 35 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP104 [GeForce GTX 1080] [10de:1b80] (rev a1)
IOMMU Group 35 01:00.1 Audio device [0403]: NVIDIA Corporation GP104 High Definition Audio Controller [10de:10f0] (rev a1)

I have modified my xorg.conf so that it only uses the 1080 for host, and I have disabled AutoAddGPU, and I have a 3 monitor setup with all 3 plugged into the 3 displayports on the 1080, my full config is this:

# nvidia-xconfig: X configuration file generated by nvidia-xconfig
# nvidia-xconfig:  version 460.73.01

Section "ServerFlags"
    Option "AutoAddGPU" "off"
EndSection

Section "ServerLayout"
    Identifier     "Layout0"
    Screen      0  "Screen0" 0 0
    InputDevice    "Keyboard0" "CoreKeyboard"
    InputDevice    "Mouse0" "CorePointer"
    Option         "Xinerama" "0"
EndSection

Section "Files"
EndSection

Section "InputDevice"

    # generated from default
    Identifier     "Mouse0"
    Driver         "mouse"
    Option         "Protocol" "auto"
    Option         "Device" "/dev/psaux"
    Option         "Emulate3Buttons" "no"
    Option         "ZAxisMapping" "4 5"
EndSection

Section "InputDevice"

    # generated from default
    Identifier     "Keyboard0"
    Driver         "kbd"
EndSection

Section "Monitor"
    Identifier     "Monitor0"
    VendorName     "Unknown"
    ModelName      "SAC DP"
    HorizSync       30.0 - 222.0
    VertRefresh     30.0 - 144.0
    Option         "DPMS"
EndSection

Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BoardName      "GeForce GTX 1080"
    BusID          "PCI:1:0:0"
EndSection

Section "Screen"
    Identifier     "Screen0"
    Device         "Device0"
    Monitor        "Monitor0"
    DefaultDepth    24
    Option         "Stereo" "0"
    Option         "nvidiaXineramaInfoOrder" "DFP-6"
    Option         "metamodes" "DP-4: 2560x1440_144 +2560+0, DP-0: 2560x1440_144 +0+0, DP-2: 2560x1440_144 +5120+0"
    Option         "SLI" "Off"
    Option         "MultiGPU" "Off"
    Option         "BaseMosaic" "off"
    SubSection     "Display"
        Depth       24
    EndSubSection
EndSection

And finally I have verified that the 1070 is not being used by anything with nvidia-smi:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.73.01    Driver Version: 460.73.01    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 1080    Off  | 00000000:01:00.0  On |                  N/A |
|  0%   60C    P0    46W / 210W |    429MiB /  8116MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 1070    Off  | 00000000:02:00.0 Off |                  N/A |
|  0%   35C    P8    11W / 230W |      2MiB /  8119MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      4462      G   /usr/lib/xorg/Xorg                347MiB |
|    0   N/A  N/A      4957      G   /usr/bin/gnome-shell               78MiB |
+-----------------------------------------------------------------------------+

As a side note, my CPU doesn't list the virtualization option as VT-d, but rather calls it by it's full name in dmesg:

[    0.302075] DMAR: IOMMU enabled
...
[    0.543400] DMAR-IR: IOAPIC id 8 under DRHD base  0xfbffc000 IOMMU 1
[    0.543401] DMAR-IR: IOAPIC id 9 under DRHD base  0xfbffc000 IOMMU 1
[    0.543402] DMAR-IR: HPET id 0 under DRHD base 0xfbffc000
[    0.543403] DMAR-IR: x2apic is disabled because BIOS sets x2apic opt out bit.
[    0.543404] DMAR-IR: Use 'intremap=no_x2apic_optout' to override the BIOS setting.
[    0.544009] DMAR-IR: Enabled IRQ remapping in xapic mode
[    5.119927] DMAR: [Firmware Bug]: RMRR entry for device 06:00.0 is broken - applying workaround
[    5.119931] DMAR: dmar0: Using Queued invalidation
[    5.119937] DMAR: dmar1: Using Queued invalidation
[    5.129946] DMAR: Intel(R) Virtualization Technology for Directed I/O

Possibly because it's a xeon? Just thought I would mention it for others who come here.

Anyway, as far as I can tell, I've done everything mentioned and I can't find anything else that would be stopping the unload. Any ideas? Yes my scripts are executable, and I've been trying to just run the commands one by one in terminal to see if I can find an error exit, but since virsh nodedev-detach never completes and just hangs, no error is reported. Any help is greatly appreciated.

Hey! I'm facing the same issue. I have a R9 270x for the host and a 1070ti for the guest. When I try to:

virsh nodedev-detach $VIRSH_GPU_VIDEO

It just hangs. My nvidia-smi is as follows:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.31       Driver Version: 465.31       CUDA Version: 11.3     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:06:00.0 Off |                  N/A |
| 43%   47C    P8    12W / 180W |      6MiB /  8119MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A       611      G   /usr/lib/Xorg                       4MiB |
+-----------------------------------------------------------------------------+

The only think I think might be creating an issue would be xorg locking the GPU somehow, but I haven't seen it mentioned anywhere else. I'll try to poke around some stuff and if I get around to solve it I'll post here. If you have any suggestions they are highly appreciated.

So I've tried some things and tried to understand what's happening in each step. I noticed something. When I boot my PC I have the following:

╰─$ lspci -nnk | grep -A3 -e VGA                    
03:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Curacao XT / Trinidad XT [Radeon R7 370 / R9 270X/370X] [1002:6810]
    Subsystem: PC Partner Limited / Sapphire Technology Device [174b:e270]
    Kernel driver in use: radeon
    Kernel modules: radeon, amdgpu
--
06:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP104 [GeForce GTX 1070 Ti] [10de:1b82] (rev a1)
    Subsystem: ZOTAC International (MCO) Ltd. Device [19da:2445]
    Kernel driver in use: nvidia
    Kernel modules: nouveau, nvidia_drm, nvidia

After I run the command ╰─$ sudo virsh nodedev-detach pci_0000_06_00_0, I opened another terminal and looked for the kernels of the GPUs. The output was:

╰─$ lspci -k | grep -A3 -e VGA                
03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Curacao XT / Trinidad XT [Radeon R7 370 / R9 270X/370X]
    Subsystem: PC Partner Limited / Sapphire Technology Device e270
    Kernel driver in use: radeon
    Kernel modules: radeon, amdgpu
--
06:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1070 Ti] (rev a1)
    Subsystem: ZOTAC International (MCO) Ltd. Device 2445
    Kernel modules: nouveau, nvidia_drm, nvidia
06:00.1 Audio device: NVIDIA Corporation GP104 High Definition Audio Controller (rev a1)

As you can see, before, it was using the drivers nvidia, after running the command, there are no drivers in use. I guess it means that there is a problems loading the vfio-pci drivers, but I'm not sure. I'll try to look into it.

Are you sure that after the detach command is run, that the nvidia driver is actually detached? Like can you run that by hand and make sure that the driver is actually detached? If not then I had a similar issue.

Ok, so I ended up getting everything to work. A friend of mine on discord and I sat down to figure it out and we made some very interesting discoveries. Bear in mind, this is what the solution was in my case, but you will have to do your own testing to see if this works for you. Problem number one: When gnome boots with the nvidia GPU attached, it likes to grab onto the GPU and never let go, regaurdless of what you put in your xorg.conf file. The way I figured this out was by using the command lsof, which prints a list of ever single open file. So, by using grep I found this little gem:

... lots of stuff up here ...
gnome-she 15558                               user  mem       REG              195,0                   762 /dev/nvidia1
... more stuff down here ...

where /dev/nvidia1 in my case was a reference to my second gpu. Since you only have one nvidia card, it would likely be nvidia0.

Even after running the detach command, gnome shell is still using the gpu, aka, the file /dev/nvidia1 is still 'open', despite my xorg.conf telling it not to. For reference, here is my xorg.conf:

# nvidia-xconfig: X configuration file generated by nvidia-xconfig
# nvidia-xconfig:  version 460.73.01

Section "ServerFlags"
    Option "AutoAddGPU" "off"
EndSection

Section "ServerLayout"
    Identifier     "Layout0"
    Screen      0  "Screen0" 0 0
    InputDevice    "Keyboard0" "CoreKeyboard"
    InputDevice    "Mouse0" "CorePointer"
    Option         "Xinerama" "0"
EndSection

Section "Files"
EndSection

Section "InputDevice"

    # generated from default
    Identifier     "Mouse0"
    Driver         "mouse"
    Option         "Protocol" "auto"
    Option         "Device" "/dev/psaux"
    Option         "Emulate3Buttons" "no"
    Option         "ZAxisMapping" "4 5"
EndSection

Section "InputDevice"

    # generated from default
    Identifier     "Keyboard0"
    Driver         "kbd"
EndSection

Section "Monitor"
    Identifier     "Monitor0"
    VendorName     "Unknown"
    ModelName      "SAC DP"
    HorizSync       30.0 - 222.0
    VertRefresh     30.0 - 144.0
    Option         "DPMS"
EndSection

Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BoardName      "GeForce GTX 1080"
    BusID          "PCI:1:0:0"
EndSection

Section "Screen"
    Identifier     "Screen0"
    Device         "Device0"
    Monitor        "Monitor0"
    DefaultDepth    24
    Option         "Stereo" "0"
    Option         "nvidiaXineramaInfoOrder" "DFP-6"
    Option         "metamodes" "DP-4: 2560x1440_144 +2560+0, DP-0: 2560x1440_144 +0+0, DP-2: 2560x1440_144 +5120+0"
    Option         "SLI" "Off"
    Option         "MultiGPU" "On"
    Option         "BaseMosaic" "off"
    SubSection     "Display"
        Depth       24
    EndSubSection
EndSection

The relevant parts there being the device section where I set my 1080 as the only GPU and never make a second device with the 1070, and the "AutoAddGPU" "off" option. But, gnome still uses it for some reason.

The way I solved this was by setting up my system like any other gpu passthrough system before it. Load vfio drivers at boot and bind them to the second GPU, and then rebind the nvidia driver after boot. Here is a quick refresher, but I am doing this from memory so please correct me if I miss something:

Step 1: Edit /etc/initramfs-tools/modules and add the lines

vfio
vfio_pci
vfio_iommu_type1
vfio_virqfd

This will load the drivers on boot.

Step 2: Create script in /etc/initramfs-tools/scripts/init-top called bind_vfio.sh ( or whatever you want ) with this content:

#!/bin/sh

DEVS="0000:02:00.0 0000:02:00.1" # <--- Change these to your specific device ids

for DEV in $DEVS; do
    echo "vfio-pci" > /sys/bus/pci/devices/$DEV/driver_override
done

modprobe -i vfio-pci # <--- Not actually sure if you need this, but it works

Step 3:

sudo chmod 755 /etc/initramfs-tools/scripts/init-top/bind_vfio.sh
sudo chown root:root /etc/initramfs-tools/scripts/init-top/bind_vfio.sh
sudo update-initramfs -u

I had to reboot my system before running the update command, otherwise it got hung up on something, not really sure why.

More details on regular vfio setups can be found here: https://forum.level1techs.com/t/vfio-in-2019-pop-os-how-to-general-guide-though-draft/142287

Once that is all done you would basically be ready to do GPU passthrough as normal..... BUT WE WANT TO USE THE GPU ON THE HOST NOW, RIGHT?! Yes, we do, so now what?

If you run the same lsof command as before, you will not see any reference to the second GPU, because it was never attached to gnome, and that's good. So now you can run virsh nodedev-reattach pci_0000_02_00_0 and the nvidia driver should re-bind. To help while visualize whats going on, you can open another terminal and use watch -n 0.1 lspci -s 0000:02:00.* -nk to see what drivers the card is using in real time.

But hold on, running the reattach command isn't enough. If you try to open something like blender and go to edit > preferences > system > cuda, your GPU wont be listed. Similarly, if you check nvidia-smi it also wont be there. That is because, on the first reattach, you need to then run nvidia-xconfig --query-gpu-info to force the driver to recognize the 'new' GPU.

At this point, you can use the GPU on the host, then run virsh nodedev-detach pci_0000_02_00_0 to detach the driver, it should the also re-bind to vfio and you can use it on a host.

BUT WAIT! THERE'S MORE!

Unfortunately, our testing uncovered another problem.... The audio driver...snd_hda_intel

If you ever reattach the audio driver to the host, you will never be able to unbind it... at least... I couldn't. This is a problem, because by default when you close the VM that is using the device, it will release it's resource including the audio device, and the audio device will re-bind to snd_hda_intel. At that point you will have to actually reboot so that the vfio driver can re-bind to the audio device. I suspect that this is what causes the majority of the detach script hanging problem.

I haven't found a very good solution to this other than to just blacklist that driver. Unfortunately for me, that also prevents my primary GPU from outputting audio over hdmi as well. It isnt a huge deal for me because I use headphones anyway, but it still isn't a very clean solution. A better solution would be to blacklist the driver on 'just' this one device at this specific pci address, but I haven't figured out how to do that.

So in the end, I finished with was the following:

Step 1: Create the file /etc/modprobe.d/blacklist-intel-snd.conf with:

blacklist snd_hda_intel
install snd_hda_intel /bin/false

This will prevent the intel audio driver from ever loading. Run the update-initramfs command again.

Step 2: Create a new systemd service in /etc/systemd/system/bind-second-gpu.service with the following:

[Unit]
Description=Bind my second GPU to Nvidia only after gnome has started
PartOf=graphical-session.target

[Service]
ExecStart=bash -c 'sleep 10; virsh nodedev-reattach pci_0000_02_00_0; sleep 2; nvidia-xconfig --query-gpu-info'
Type=oneshot

[Install]
WantedBy=graphical-session.target

After that, run sudo systemctl enable bind-second-gpu.service

This will auto bind the second GPU back to the host on a fresh boot.

I then proceeded with the regular qemu hooks with the bind script being:

#!/bin/bash

## Load the config file
source "/etc/libvirt/hooks/kvm.conf"

## Unbind gpu from nvidia and bind to vfio
virsh nodedev-detach $VIRSH_GPU_VIDEO
virsh nodedev-detach $VIRSH_GPU_AUDIO

And unbind:

#!/bin/bash

## Load the config file
source "/etc/libvirt/hooks/kvm.conf"

## Unbind gpu from vfio and bind to nvidia
virsh nodedev-reattach $VIRSH_GPU_VIDEO
virsh nodedev-reattach $VIRSH_GPU_AUDIO

The reattach on the audio device really does nothing because there is no driver for it to attach to anymore.

THIS WILL ONLY LET YOU USE THE SECOND GPU FOR CUDA ON HOST! THERE IS NO VIDEO OUTPUT WHEN REATTACHED! It would be cool if we could get that working but I don't think you could without starting a second xsession that could be killed when you want to switch. Also, if the second gpu is being used, and you try to start a VM, the VM won't start until the GPU is freed. For example, if I am running a render in Blender using that GPU, I have to close Blender before the VM will start.

OK, I think that is everytihng. Please remember to replace my PCI ID pci_0000_02_00_0 With whatever you are using. Thank you @NineBallAYAYA for staying up til 5am 3 nights in a row to help me diagnose this, lol

Came across Moonlight63's comment here, it ended up being very helpful. I figured out that you can block snd_hda_intel from loading on a single device using a udev rule, just create a file at /etc/udev/rules.d/90-vfio-gpu-audio.rules and drop this in there. Make sure to replace both occurrences of the ID with your GPU's audio device ID:

SUBSYSTEM=="pci", KERNEL=="0000:0b:00.1", PROGRAM="/bin/sh -c 'echo -n 0000:0b:00.1 > /sys/bus/pci/drivers/snd_hda_intel/unbind'"

In addition, to get this working with Dracut instead of initramfs-tools, you can create a directory at /usr/lib/dracut/modules.d/64bind-vfio and put the initramfs script in there. Then, you'll need to create another script file in that same directory named module-setup.sh and paste this in there:

#!/bin/sh

check() {
  return 0
}

depends() {
  return 0
}

install() {
  # Replace the filename here after $moddir/ with the name you gave the other script!
  inst_hook pre-trigger 64 "$moddir/force-vfio-pci.sh"
}

After that, you should be able to add it just like a standard Dracut module.

bryansteiner / gpu-passthrough-tutorial

nodedev-detach hangs #22