canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.39k stars 932 forks source link

LXD physical gpu does not pass through the entire gpu #12643

Open megheaiulian opened 12 months ago

megheaiulian commented 12 months ago

When using a gpu of type physical for a VM LXD does not pass through all the component devices of the gpu.

For example using this device configuration

 gpu:
    gputype: physical
    pci: c3:00.0
    type: gpu

produces this qemu config:

# GPU card ("gpu" device)
[device "dev-lxd_gpu"]
driver = "vfio-pci"
bus = "qemu_pcie5"
addr = "00.0"
multifunction = "on"
host = "0000:c3:00.0"
x-vga = "on"

The gpu used here (a RX66000) has a audio component at 0000:c3:00.1 that is not passed through. Without it the AMD drivers will not initialize correctly.

Adding raw.qemu: -device vfio-pci,host=c3:00.1,bus=qemu_pcie5 makes this work but it is not very intuitive.

roosterfish commented 12 months ago

Hi @megheaiulian, have you tried using the pci device type in LXD? https://documentation.ubuntu.com/lxd/to/latest/reference/devices_pci/

I guess it would be something like lxc config device add {vm} audio pci address=0000:c3:00.1.

megheaiulian commented 12 months ago

Yes that would obviously work.

In https://github.com/canonical/lxd/blob/e90ae16cc7c573d5a6ad37783e888190afda9ffd/lxd/instance/drivers/driver_qemu.go#L4487 there is code that tries to grab other devices from the same iommu group.
For some reason this is not working in this case. Could it be because it is prefixed with consumer and the code at https://github.com/canonical/lxd/blob/e90ae16cc7c573d5a6ad37783e888190afda9ffd/lxd/instance/drivers/driver_qemu.go#L4495C15-L4495C24 checks for a prefix matching the pciSlotName.

This is how devices in that iommu group look for me:

image

roosterfish commented 11 months ago

Can you please send the output of ls /sys/bus/pci/devices/0000:c3:00.0/iommu_group/devices since that is the dir whose contents get iterated over.

megheaiulian commented 11 months ago

It shows only the gpu device and not the audio component: image

gabrielmougard commented 11 months ago

@megheaiulian what is the output of cat /proc/cmdline ? Check that you have iommu=pt amd_iommu=on in the output. Also can you show me the output of uname -r. Also, the way PCIe devices are set up on the motherboard can affect IOMMU grouping. The IOMMU groups are essentially how the system's hardware is compartmentalized for DMA (Direct Memory Access) protection. The layout and distribution can vary based on the motherboard's firmware or the physical configuration of the PCIe slots. If possible, try changing the slot in which the GPU is installed and try ls /sys/bus/pci/devices/<pci_base_addr>/iommu_group/devices and see if you have two pci addresses... Also, what's your motherboard firmware and GPU driver version? Sometimes, firmware updates can resolve hardware compatibility issues or improve IOMMU groupings. Ideally, try to see if these are the last versions and if not try to do the update and check ls /sys/bus/pci/devices/<pci_base_addr>/iommu_group/devices again.

gabrielmougard commented 11 months ago

As an example, with GPU, I have:

$ lspci
...
42:00.0 VGA compatible controller: NVIDIA Corporation GA106 [GeForce RTX 3060 Lite Hash Rate] (rev a1)
42:00.1 Audio device: NVIDIA Corporation Device 228e (rev a1)

So just like you, I have an audio component in it. For the rest of the informations:

$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-6.2.0-39-generic root=UUID=c37564b1-a9d6-464c-b413-0d895acf7c9f ro quiet splash iommu=pt amd_iommu=on pci=assign-busses kvm_amd.npt=1 kvm_amd.avic=1 kvm.ignore_msrs=1 vt.handoff=7
$ uname -r
6.2.0-39-generic

Unfortunately, I don't have an AMD card for the testing of the driver version. Lastly, on my side, I have:

ls /sys/bus/pci/devices/0000:42:00.0/iommu_group/devices
0000:42:00.0  0000:42:00.1

Having these groupped under iommu_group/devices, I have no problem for the passthrough of multiple components.

megheaiulian commented 11 months ago

@gabrielmougard I have a correct iommu setup. I am able to passthrough correctly to windows or linux vms directly with qemu. The issue is only that LXD is not able to pickup the audio component of that card and add it as qemu device because it's not under ls /sys/bus/pci/devices/0000:c3:00.0/iommu_group/devices. Instead it seems to be ls /sys/bus/pci/devices/0000:c3:00.0/iommu_group/consumer:pci:0000:c3:00.1. Could be something in the kernel specific to amd cards ...

gabrielmougard commented 11 months ago

Possibly. This is hard to know. @mihalicyn did you experience such an issue with an AMD card?