Open elezar opened 3 weeks ago
This change treats errors in constructing vGPU labels as warnings.
If errors occur the nvidia.com/vgpu.present label is set to false instead of raising an error.
nvidia.com/vgpu.present
false
For example, on my mac:
./gpu-feature-discovery --oneshot --output="" --node-name=foo I0422 20:59:12.321562 63053 main.go:139] Starting OS watcher. I0422 20:59:12.321919 63053 main.go:144] Loading configuration. I0422 20:59:12.323056 63053 main.go:156] Running with config: { "version": "v1", "flags": { "migStrategy": "none", "failOnInitError": true, "gdsEnabled": null, "mofedEnabled": null, "useNodeFeatureAPI": false, "gfd": { "oneshot": true, "noTimestamp": false, "sleepInterval": "1m0s", "outputFile": "", "machineTypeFile": "/sys/class/dmi/id/product_name" } }, "resources": { "gpus": null }, "sharing": { "timeSlicing": {} } } I0422 20:59:12.323797 63053 factory.go:49] Detected non-NVML platform: could not load NVML library: dlopen(libnvidia-ml.so.1, 0x0001): tried: 'libnvidia-ml.so.1' (no such file), '/System/Volumes/Preboot/Cryptexes/OSlibnvidia-ml.so.1' (no such file), '/usr/lib/libnvidia-ml.so.1' (no such file, not in dyld cache), 'libnvidia-ml.so.1' (no such file) I0422 20:59:12.323835 63053 factory.go:49] Detected non-Tegra platform: /sys/devices/soc0/family file not found W0422 20:59:12.323847 63053 factory.go:72] No valid resources detected; using empty manager. I0422 20:59:12.323853 63053 main.go:170] Start running E0422 20:59:12.323900 63053 vgpu.go:41] "unable to get vGPU devices" err="error getting NVIDIA specific PCI devices: unable to read PCI bus devices: open /sys/bus/pci/devices: no such file or directory" I0422 20:59:12.323917 63053 main.go:239] Creating Labels nvidia.com/gfd.timestamp=1713812352 nvidia.com/vgpu.present=false I0422 20:59:12.323928 63053 main.go:136] Exiting
This change treats errors in constructing vGPU labels as warnings.
If errors occur the
nvidia.com/vgpu.present
label is set tofalse
instead of raising an error.For example, on my mac: