NVIDIA / kubevirt-gpu-device-plugin

NVIDIA k8s device plugin for Kubevirt
BSD 3-Clause "New" or "Revised" License
223 stars 66 forks source link

getDeviceName can potentially match incorrect device based on ordering in pci.ids #106

Closed vishnukarthikl closed 3 months ago

vishnukarthikl commented 4 months ago

https://github.com/NVIDIA/kubevirt-gpu-device-plugin/commit/f9edffc3eefe9a852d6bf617acbd9ad779150207 added the device ids for H100s, but using the build before that commit would recognize the device id with 2330 (H100 SXM) as

allocatable:
    nvidia.com/gpu-vm-802_11BG_WIRELESS_CARDBUS_ADAPTER: "2"

Because the device id matches with another device name

cat utils/pci.ids | grep 2330
        1186 3a1a  WNA-2330 802.11bg Wireless CardBus Adapter
    2330  ZyWALL Turbo Card
    2330  DH89xxCC SMBus Controller

The exact issue does not happen after https://github.com/NVIDIA/kubevirt-gpu-device-plugin/commit/f9edffc3eefe9a852d6bf617acbd9ad779150207, but potentially this can happen again if pci.ids is updated and a device being searched for matches with another device name earlier in the list.

Some options are to parse the pci.ids to ensure that 1) Given (vendor id, device id) matches 2) Or If vendor id is always NVidia then, parse only the devices that is under 10de NVIDIA Corporation

vishnukarthikl commented 3 months ago

Oops seems like this was fixed in https://github.com/NVIDIA/kubevirt-gpu-device-plugin/commit/5b4613600c5f765809b275e19d351d8eadf48c05. I was using a very old version of the plugin.