Closed KasperSkytte closed 2 years ago
Newest version of the ansible role as of an hour or so ago. Target node Ubuntu version 20.04 LTS.
same problem here
I can confirm I'm seeing the same thing. The driver role code hasn't changed since the last successful test, but this looks like an issue in the upstream repo. I'll check with the repo maintainers.
Confirmed there was an issue with the upstream apt package repository, and this should now be fixed.
Tested via triggering a CI run and confirming that all install paths are successful: https://github.com/NVIDIA/ansible-role-nvidia-driver/runs/6314471044
Please make sure to run apt-get update
before attempting a new install. Closing this based on the successful test, but feel free to re-open this issue if you see the issue persist.
Thanks for the quick response, but I still get the same error. Even after reinstalling the role and updating package info with sudo apt-get update
. Fresh Ubuntu focal VM, everything default.
I have no permission to re-open the issue
Oops! Re-opening the issue myself and will kick off another test on my end. (both through CI and on a local VM)
@KasperSkytte : Both the CI tests and my local VM tests are successfully installing the driver from the CUDA repos. This worked on all of Ubuntu 18.04, Ubuntu 20.04, and CentOS 7.
Can you confirm if you are still seeing this issue?
If so -- can you please do the following to help troubleshoot?
sudo apt-get update
and sudo apt-get install cuda-drivers-510
manually, and provide a gist with the full logon my side things are settling down. driver + docker ansible roles are now doing their jobs on a bare new setup, everything is installed from cuda repo with success.
ex. on a machine with 460 drivers :
$ apt-cache policy nvidia-driver-460
nvidia-driver-460:
Installed: 460.106.00-0ubuntu1
Candidate: 460.106.00-0ubuntu1
Version table:
470.103.01-0ubuntu0.20.04.1 500
500 http://fr.archive.ubuntu.com/ubuntu focal-updates/restricted amd64 Packages
500 http://fr.archive.ubuntu.com/ubuntu focal-security/restricted amd64 Packages
*** 460.106.00-0ubuntu1 600
600 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 Packages
100 /var/lib/dpkg/status
460.91.03-0ubuntu1 600
600 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 Packages
460.73.01-0ubuntu1 600
600 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 Packages
460.32.03-0ubuntu1 600
600 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 Packages
460.27.04-0ubuntu1 600
600 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 Packages
I think the main problem is dealing with machines where the previous version of this role was running. In this case i'm afraid :
A better care of these cases should at least raise a warning or error in this role. The best thing to do is maybe to make a "cleaning" task to make sure that every old parts of this role, nv repos & nv pat keys are truely cleaned beforehand
I can help by giving the system state and files created by the "old" setup method.
@ajdecon Thank you for being so persistent. I got it working now too. Turns out my "fresh" VM wasn't so fresh. Was confusing myself with multiple ones. Works out of the box now on Ubuntu 20. I'm closing again.
Agree with @BarthV that a few tasks to clean up from any other method(s) to install nvidia driver would be handy.
I was also struggling to install the driver with the Ansible role and faced this exact same issue. So, I followed the installation guide. Because I wasn't able to install the cuda-keyring
package, I followed the alternative steps described in 3.8.3.2, and this allowed me to install the driver. For good measure, I re-ran Ansible after this, and the driver installed successfully. I hope this is useful to someone.
When setting
nvidia_driver_ubuntu_install_from_cuda_repo: yes
I get:The package doesn't exist:
No issues when setting
nvidia_driver_ubuntu_install_from_cuda_repo: no
, but I would like to install from the CUDA repository though.