NVIDIA / yum-packaging-precompiled-kmod

NVIDIA precompiled kernel module packaging for RHEL
Apache License 2.0
35 stars 16 forks source link

Package flagged as installonly prevents yum update from succeeding #42

Closed txangel closed 1 year ago

txangel commented 1 year ago

In this line https://github.com/NVIDIA/yum-packaging-precompiled-kmod/blob/main/yum-kmod-nvidia.spec#L75 it seems the package is flagged as an installonly package

This leads to yum attempting an install when we request a package update:

Package 3:kmod-nvidia-latest-3.10.0-1160.49.1.r470.82.01.el7.post1.x86_64 is allowed multiple installs

Which eventually causes a conflict as to do an install it needs to remove the old driver and that would break the older version of the installed kmod.

It might be by design (installonly is documented to be used to prevent packages from being updated) but if so it does cause issues with automation tools like yum-cron and auter

Is this intended or is it an oversight?

kmittman commented 1 year ago

I believe this a WAR for EL7 distros to allow multiple kmod packages (same NVIDIA driver, different kernel versions) to be installed at the same time.

As per: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/deployment_guide/sec-configuring_yum_and_yum_repositories

installonlypkgs=space separated list of packages Here you can provide a space-separated list of packages which yum can install, but will never update. See the yum.conf(5) manual page for the list of packages which are install-only by default.

If you add the installonlypkgs directive to /etc/yum.conf, you should ensure that you list all of the packages that should be install-only, including any of those listed under the installonlypkgs section of yum.conf(5). In particular, kernel packages should always be listed in installonlypkgs (as they are by default), and installonly_limit should always be set to a value greater than 2 so that a backup kernel is always available in case the default one fails to boot.

txangel commented 1 year ago

Thanks for the quick response @kmittman!

Yes that's what it seems to allow but the issue then becomes that to install two of them, say 470 and 525 Both seem to have a dependency on nvidia-driver-latest but 470 depends on nvidia-driver-latest = 3:470.xx.xx while 525 depends on nvidia-driver-latest = 3:525.xx.xx And when it tries to install both it finds that it can't install both drivers.

Or at least that's what we are experiencing in our environments where it hits the conflict

nvidia-driver-latest = 3:470.xx.xx is needed by kmod-nvidia-latest-3:zzzr470zzz

(which then prevents it from installing 525)

I find it odd that this hasn't been reported before, perhaps it's exclusive to our environments for some reason I can't yet grasp. If you think that the spec supporting multiple installations is indeed as designed and a feature feel free to close this.

kmittman commented 1 year ago

Are you trying to install 470.xx and 525.xx side-by-side (not supported) or want to upgrade from 470 -> 525 ?

What I was referencing in previous comment was let's say driver 470.182.03 and both kernel 3.10.0-100 and 3.10.5-200 are installed, then there could be kmod for each, so that when reboot into either Linux kernel the .ko modules can load.

Could you provide more details about what command is run (I understand there may be some config management engine, but if you could piece together the yum command) that would be helpful, for example

sudo yum install nvidia-driver-latest

and the resultant NVIDIA RPMs that are installed

rpm -qa | grep nvidia | sort

Alternatively, if you are trying to stay on 470.xxx then

sudo yum install nvidia-driver-branch-470
txangel commented 1 year ago

We try to upgrade from 470 to 525 but the way it happens is through the upgrade of the kmod package.

We don't install the nvidia-driver-latest directly (which maybe is what gets us into this)

The command it runs is a simple yum update:

Resolving Dependencies
[...]
Dependencies Resolved

====================================================================================================================================================================================================
 Package                                                  Arch                           Version                                                      Repository                               Size
====================================================================================================================================================================================================
Installing:
 kmod-nvidia-latest                                       x86_64                         3:zzzz.r525.zzzz                                             REDACTED                          REDACTED
Updating:
 nvidia-driver-latest                                     x86_64                         3:525.85.12-1.el7                                            REDACTED                             REDACTED
 nvidia-driver-latest-NVML                                x86_64                         3:525.85.12-1.el7                                            REDACTED                             REDACTED
 nvidia-driver-latest-NvFBCOpenGL                         x86_64                         3:525.85.12-1.el7                                            REDACTED                              REDACTED
 nvidia-driver-latest-cuda                                x86_64                         3:525.85.12-1.el7                                            REDACTED                             REDACTED
 nvidia-driver-latest-cuda-libs                           x86_64                         3:525.85.12-1.el7                                            REDACTED                              REDACTED
 nvidia-driver-latest-devel                               x86_64                         3:525.85.12-1.el7                                            REDACTED                              REDACTED
 nvidia-driver-latest-libs                                x86_64                         3:525.85.12-1.el7                                            REDACTED                             REDACTED
 nvidia-modprobe-latest                                   x86_64                         3:525.85.12-1.el7                                            REDACTED                              REDACTED
 nvidia-persistenced-latest                               x86_64                         3:525.85.12-1.el7                                            REDACTED                              REDACTED
 nvidia-xconfig-latest                                    x86_64                         3:525.85.12-1.el7                                            REDACTED                              REDACTED
Installing for dependencies:
 egl-wayland                                              x86_64                         1.1.6-1.el7                                                  REDACTED                                    REDACTED
Not available:
 kmod-nvidia-latest                                       x86_64                         3:zzz.r525.zzz                                                 -                                       0.0

Transaction Summary
====================================================================================================================================================================================================
Install         1 Package  (+1 Dependent package)
Upgrade        10 Packages
Not available   1 Package

Total download size: REDACTED
[...]
Total                                                                                                                                                               125 MB/s | 334 MB  00:00:02
Running transaction check
ERROR with transaction check vs depsolve:
nvidia-driver-latest = 3:470.zzz is needed by (installed) kmod-nvidia-latest-3:zzz.r470.zzz
 You could try running: rpm -Va --nofiles --nodigest
kmittman commented 1 year ago

I know dnf has --allowerasing and --best flags, though I don't recall if those are applicable to yum or if maybe yum upgrade sets those flags versus yum update

txangel commented 1 year ago

Sadly as far as I can tell both yum upgrade and yum update --obsoletes fail because of the same conflict (that it cannot remove the old driver because that would break the old kmod package)

txangel commented 1 year ago

As a workaround we have now changed our infra so our systems do an uninstall followed by install.

txangel commented 1 year ago

(thanks by the way) (: