Closed vishnukarthikl closed 3 months ago
https://github.com/NVIDIA/kubevirt-gpu-device-plugin/commit/f9edffc3eefe9a852d6bf617acbd9ad779150207 added the device ids for H100s, but using the build before that commit would recognize the device id with 2330 (H100 SXM) as
2330
allocatable: nvidia.com/gpu-vm-802_11BG_WIRELESS_CARDBUS_ADAPTER: "2"
Because the device id matches with another device name
cat utils/pci.ids | grep 2330 1186 3a1a WNA-2330 802.11bg Wireless CardBus Adapter 2330 ZyWALL Turbo Card 2330 DH89xxCC SMBus Controller
The exact issue does not happen after https://github.com/NVIDIA/kubevirt-gpu-device-plugin/commit/f9edffc3eefe9a852d6bf617acbd9ad779150207, but potentially this can happen again if pci.ids is updated and a device being searched for matches with another device name earlier in the list.
Some options are to parse the pci.ids to ensure that 1) Given (vendor id, device id) matches 2) Or If vendor id is always NVidia then, parse only the devices that is under 10de NVIDIA Corporation
10de NVIDIA Corporation
Oops seems like this was fixed in https://github.com/NVIDIA/kubevirt-gpu-device-plugin/commit/5b4613600c5f765809b275e19d351d8eadf48c05. I was using a very old version of the plugin.
https://github.com/NVIDIA/kubevirt-gpu-device-plugin/commit/f9edffc3eefe9a852d6bf617acbd9ad779150207 added the device ids for H100s, but using the build before that commit would recognize the device id with
2330
(H100 SXM) asBecause the device id matches with another device name
The exact issue does not happen after https://github.com/NVIDIA/kubevirt-gpu-device-plugin/commit/f9edffc3eefe9a852d6bf617acbd9ad779150207, but potentially this can happen again if pci.ids is updated and a device being searched for matches with another device name earlier in the list.
Some options are to parse the pci.ids to ensure that 1) Given (vendor id, device id) matches 2) Or If vendor id is always NVidia then, parse only the devices that is under
10de NVIDIA Corporation