NVIDIA / ansible-role-nvidia-driver

BSD 3-Clause "New" or "Revised" License
117 stars 67 forks source link

Fix default package list for upgrade workflow #33

Closed ajdecon closed 3 years ago

ajdecon commented 3 years ago

Due to the way the package dependencies are structured, trying to upgrade to a new driver branch with the Canonical packages currently fails.

For example, if we install the 450 driver branch with DeepOps, and then try to run the playbook again with nvidia_driver_ubuntu_branch: 460, we see an error like this:

TASK [nvidia.nvidia_driver : install driver packages] ***********************************************************************************
failed: [gpu01] (item=['nvidia-headless-460-server', 'nvidia-utils-460-server']) => changed=false
  ansible_loop_var: item
  cache_update_time: 1615947575
  cache_updated: false
  item:
  - nvidia-headless-460-server
  - nvidia-utils-460-server
  msg: |-
    '/usr/bin/apt-get -y -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold"      install 'nvidia-headless-460-server' 'nvidia-utils-460-server'' failed: E: Unable to correct problems, you have held broken packages.
  rc: 100
  stderr: |-
    E: Unable to correct problems, you have held broken packages.
  stderr_lines: <omitted>
  stdout: |-
    Reading package lists...
    Building dependency tree...
    Reading state information...
    Some packages could not be installed. This may mean that you have
    requested an impossible situation or if you are using the unstable
    distribution that some required packages have not yet been created
    or been moved out of Incoming.
    The following information may help to resolve the situation:
    The following packages have unmet dependencies:
     nvidia-headless-460-server : Depends: nvidia-headless-no-dkms-460-server but it is not going to be installed
  stdout_lines: <omitted>

Adding nvidia-headless-no-dkms-{{ nvidia_driver_ubuntu_branch }}-server to the list of packages we specify explicitly as part of the install appears to resolve the issue. The playbook runs successfully, and when we check the package list it shows all packages are in the new driver branch (460). (Note that all 450 packages are in rc state)

vagrant@ubuntu1804:~$ dpkg -l | grep nvidia
ii  libnvidia-cfg1-460-server:amd64       460.32.03-0ubuntu0.18.04.1        amd64        NVIDIA binary OpenGL/GLX configuration library
rc  libnvidia-compute-450-server:amd64    450.102.04-0ubuntu0.18.04.1       amd64        NVIDIA libcompute package
ii  libnvidia-compute-460-server:amd64    460.32.03-0ubuntu0.18.04.1        amd64        NVIDIA libcompute package
rc  nvidia-compute-utils-450-server       450.102.04-0ubuntu0.18.04.1       amd64        NVIDIA compute utilities
ii  nvidia-compute-utils-460-server       460.32.03-0ubuntu0.18.04.1        amd64        NVIDIA compute utilities
rc  nvidia-dkms-450-server                450.102.04-0ubuntu0.18.04.1       amd64        NVIDIA DKMS package
ii  nvidia-dkms-460-server                460.32.03-0ubuntu0.18.04.1        amd64        NVIDIA DKMS package
ii  nvidia-headless-460-server            460.32.03-0ubuntu0.18.04.1        amd64        NVIDIA headless metapackage
ii  nvidia-headless-no-dkms-460-server    460.32.03-0ubuntu0.18.04.1        amd64        NVIDIA headless metapackage - no DKMS
rc  nvidia-kernel-common-450-server       450.102.04-0ubuntu0.18.04.1       amd64        Shared files used with the kernel module
ii  nvidia-kernel-common-460-server       460.32.03-0ubuntu0.18.04.1        amd64        Shared files used with the kernel module
ii  nvidia-kernel-source-460-server       460.32.03-0ubuntu0.18.04.1        amd64        NVIDIA kernel source package
ii  nvidia-utils-460-server               460.32.03-0ubuntu0.18.04.1        amd64        NVIDIA Server Driver support binaries
acoastalfog commented 3 years ago

Upgrade path from 450 to 460 failed on my single node install without

in addition.

ajdecon commented 3 years ago

@acoastalfog : Hmm. That wasn't needed in my testing, but OTOH I don't see a downside to including it in the explicit package list. Added!