Open sunwuyan opened 2 months ago
@sunwuyan the driver will always be reinstalled a reboot, this is the current limitation. Please see this comment: https://github.com/NVIDIA/gpu-operator/issues/705#issuecomment-2077761858
@sunwuyan the driver will always be reinstalled a reboot, this is the current limitation. Please see this comment: #705 (comment)
3q,I looked at the code, and it seems that if the driver.usePrecompile property is set to true, it shouldn't repeat the network update,but I haven't tried it yet, my operating system is ubuntu20.04
Correct. If precompiled drivers are used, then we do not need network connectivity to update the package cache.
However, we do not have precompiled driver images published for Ubuntu 20.04. We only have tags for Ubuntu 22.04, see https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/precompiled-drivers.html#limitations-and-restrictions
Correct. If precompiled drivers are used, then we do not need network connectivity to update the package cache.
However, we do not have precompiled driver images published for Ubuntu 20.04. We only have tags for Ubuntu 22.04, see https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/precompiled-drivers.html#limitations-and-restrictions
3q
After using gpu-operator to integrate the GPU successfully, when restarting the GPU node, can I not reinstall the driver?Because my K8S cluster cannot access the public network under normal conditions, every time the nvidia-driver-daemonset pod is restarted, it needs to be connected to the network to complete the startup, otherwise the error will be reported:
========== NVIDIA Software Installer ==========
Starting installation of NVIDIA driver version 550.54.14 for Linux kernel version 5.15.0-67-generic
Stopping NVIDIA persistence daemon... Unloading NVIDIA driver kernel modules... Unmounting NVIDIA driver rootfs... Checking NVIDIA driver packages... Updating the package cache... E: The repository 'http://archive.ubuntu.com/ubuntu focal InRelease' is not signed. E: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/focal/InRelease Clearsigned file isn't valid, got 'NOSPLIT' (does the network require authentication?) E: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/focal-updates/InRelease Clearsigned file isn't valid, got 'NOSPLIT' (does the network require authentication?) E: The repository 'http://archive.ubuntu.com/ubuntu focal-updates InRelease' is not signed. E: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/focal-security/InRelease Clearsigned file isn't valid, got 'NOSPLIT' (does the network require authentication?) E: The repository 'http://archive.ubuntu.com/ubuntu focal-security InRelease' is not signed. Stopping NVIDIA persistence daemon... Unloading NVIDIA driver kernel modules... Unmounting NVIDIA driver rootfs...
I tried setting driver.upgradePolicy.autoUpgrade to false and it didn't work either