NVIDIA / kubevirt-gpu-device-plugin

NVIDIA k8s device plugin for Kubevirt
BSD 3-Clause "New" or "Revised" License
209 stars 66 forks source link

Environment variable case issues when device name is not found. #70

Open greg-bock opened 1 year ago

greg-bock commented 1 year ago

If a device id is not present in the pci id file it defaults to using the device id as the device name. However device name normalization and capitalization for use in the environment variable only occurs in getDeviceName during pci id lookup. This results in the resource name staying lowercase as well as the resource name portion of the environment variable. Kubevirt expects an all uppercase environment variable.

Suggested fixes include:

+++ b/pkg/device_plugin/device_plugin.go
@@ -101,7 +101,7 @@ func createDevicePlugins() {
                deviceName := getDeviceName(k)
                if deviceName == "" {
                        log.Printf("Error: Could not find device name for device id: %s", k)
-                       deviceName = k
+                       deviceName = strings.ToUpper(k)
                }
                log.Printf("DP Name %s", deviceName)
                dp := NewGenericDevicePlugin(deviceName, "/sys/kernel/iommu_groups/", devs)
@@ -123,7 +123,7 @@ func createDevicePlugins() {
                }
                deviceName := getDeviceName(k)
                if deviceName == "" {
-                       deviceName = k
+                       deviceName = strings.ToUpper(k)
                }
                log.Printf("DP Name %s", deviceName)
                dp := NewGenericVGpuDevicePlugin(deviceName, vGpuBasePath, devs)

or possibly

+++ b/pkg/device_plugin/device_plugin.go
@@ -99,10 +99,6 @@ func createDevicePlugins() {
                        })
                }
                deviceName := getDeviceName(k)
-               if deviceName == "" {
-                       log.Printf("Error: Could not find device name for device id: %s", k)
-                       deviceName = k
-               }
                log.Printf("DP Name %s", deviceName)
                dp := NewGenericDevicePlugin(deviceName, "/sys/kernel/iommu_groups/", devs)
                err := startDevicePlugin(dp)
@@ -122,9 +118,6 @@ func createDevicePlugins() {
                        })
                }
                deviceName := getDeviceName(k)
-               if deviceName == "" {
-                       deviceName = k
-               }
                log.Printf("DP Name %s", deviceName)
                dp := NewGenericVGpuDevicePlugin(deviceName, vGpuBasePath, devs)
                err := startVgpuDevicePlugin(dp)
@@ -319,7 +312,6 @@ func getDeviceName(deviceID string) string {
                                continue
                        }
                        deviceName = strings.TrimSpace(splits[1])
-                       deviceName = strings.ToUpper(deviceName)
                        deviceName = strings.Replace(deviceName, "/", "_", -1)
                        deviceName = strings.Replace(deviceName, ".", "_", -1)
                        reg, _ := regexp.Compile("\\s+")
@@ -333,5 +325,13 @@ func getDeviceName(deviceID string) string {
        if err := scanner.Err(); err != nil {
                log.Printf("Error reading pci ids file %s", err)
        }
+
+       if deviceName == "" {
+               log.Printf("Error: Could not find device name for device id: %s", deviceID)
+               deviceName = deviceID
+       }
+
+       deviceName = strings.ToUpper(deviceName)
+
        return deviceName
 }
rthallisey commented 1 year ago

@greg-bock can you open a PR? I'll review and get it merged, thanks.

greg-bock commented 1 year ago

We've decided to use the internal kubevirt device plugin but I thought I'd drop a note since I ran across this during some testing. I'm unsure which method would be preferred and I have to focus my efforts elsewhere (depending on the fix some test cases will also need to be adjusted and/or added). I should also note that the capitalization test Returns the device name from pci.ids in capital letters test in pkg/device_plugin/device_plugin_test.go is using a device name that is already in all caps.