Closed clrfuerst closed 1 year ago
@clrfuerst can you try using the latest kubevirt-gpu-device-plugin
image, v1.2.2
? Set sandboxDevicePlugin.version=v1.2.2
in ClusterPolicy. Note, the pci id database was updated in v1.2.2
so the L40 GPU should be named with its device name (rather than device id) -- you will have to update your hyperconverged configuration accordingly.
Thank you for the pointer, this seems to have done the trick.
1. Quick Debug Checklist
kubevirt-hyperconfig
spec: permittedHostDevices: pciHostDevices:
oc describe node XXXX Capacity: nvidia.com/26b5: 1 Allocatable: nvidia.com/26b5: 1
1. Issue or feature description
Getting the following error when trying to use a L40 GPU with PCI Passthrough to a Virtual Machine - which then won't assign the GPU or start the VM.
From the nvidia-sandbox-device-plugin-daemonset 2023/07/10 19:41:03 Nvidia device 0000:e2:00.0 2023/07/10 19:41:03 Iommu Group 128 2023/07/10 19:41:03 Device Id 26b5 2023/07/10 19:41:03 Error accessing file path "/sys/bus/mdev/devices": lstat /sys/bus/mdev/devices: no such file or directory 2023/07/10 19:41:03 Iommu Map map[128:[{0000:e2:00.0}]] 2023/07/10 19:41:03 Device Map map[26b5:[128]] 2023/07/10 19:41:03 vGPU Map map[] 2023/07/10 19:41:03 GPU vGPU Map map[] 2023/07/10 19:41:03 Error: Could not find device name for device id: 26b5 2023/07/10 19:41:03 DP Name 26b5 2023/07/10 19:41:03 Devicename 26b5 2023/07/10 19:41:03 26b5 Device plugin server ready
virt-launcher pod trying to allocate the device server error. command SyncVMI failed: "failed to create GPU host-devices: the number of GPU/s do not match the number of devices:\nGPU: [{26b5 nvidia.com/26b5}]\nDevice: []"
{"component":"virt-launcher","level":"warning","msg":"PCI_RESOURCE_NVIDIA_COM_26B5 not set for resource nvidia.com/26b5","pos":"addresspool.go:50","timestamp":"2023-07-11T16:11:34.667518Z"}
2. Steps to reproduce the issue
Trying to launch a VM using an L40 GPU vs an A40 GPU using pci-passthrough