Closed nikito closed 3 weeks ago
Closing issue, not sure what I did but I uninstalled the operator, then reinstalled 24.9.0 from scratch and everything appears to be working now.
I can confirm the issue. Running v24.9.0 in K3S/Flux on Ubuntu 24.04 LTS with driver 535. Rollback to v24.6.2 fixes the issue. Unlike @nikito, I did not manage to upgrade to v24.9.0 after the rollback (tried uninstalling & reinstalling from scratch). Staying with v24.6.2 for now.
When upgrading to latest gpu-operator v24.9.0, when the nvidia-container-toolkit-daemonset fails to initialize with the following error: level=error msg="error running nvidia-toolkit: unable to determine runtime options: unable to load containerd config: failed to load config: failed to run command chroot [/host containerd config dump]: exit status 127"
If I rollback to v24.6.2 everything initializes correctly.