QubesOS / qubes-issues

The Qubes OS Project issue tracker
https://www.qubes-os.org/doc/issue-tracking/
541 stars 48 forks source link

sdubby & grubby-dummy: conflicting dependencies when installing akmod-nvidia #9556

Open RandyTheOtter opened 1 week ago

RandyTheOtter commented 1 week ago

How to file a helpful issue

Qubes OS release

4.2.3

Brief summary

It is impossible to install certain nvidia dGPU drivers in fedora qubes. At least akmod-nvidia 3:560.35.03-1.fc40.x86_64 (and fc39, but EOL is coming) and xorg-x11-drv-nvidia-3:560.35.03-5.fc40.x86_64 are affected.

Steps to reproduce

  1. Get nvidia card that needs aforementioned drivers
  2. Install fedora-40-xfce template, update
  3. Prepare the standalone (set your own pcidevs):
    
    {% if grains['id'] == 'dom0' %}

nvidia-driver--create-qube: qvm.vm:

{% elif grains['id'] == 'f40-standalone-nvdrv' %}

nvidia-driver--enable-repo: cmd.run:

{% endif %}

4. Start it up (you may have to use console, on my machine gui agent doesn't work at this stage)
5. Try to install `akmod-nvidia`

### Expected behavior

Software installs as usual, proceed to waiting for akmod to build

### Actual behavior

Last metadata expiration check: 0:46:44 ago on Sat Nov 2 20:19:16 2024. Dependencies resolved.

Package Arch Version Repository Size

Installing: akmod-nvidia x86_64 3:550.67-1.fc40 rpmfusion-nonfree 40 k Installing dependencies: akmods noarch 0.5.8-8.fc40 fedora 32 k bison x86_64 3.8.2-7.fc40 fedora 1.0 M cmake-filesystem x86_64 3.28.2-1.fc40 fedora 18 k egl-gbm x86_64 2:1.1.2^20240919gitb24587d-3.fc40 updates 21 k egl-wayland x86_64 1.1.17^20241016git0cd471d-3.fc40 updates 44 k elfutils-libelf-devel x86_64 0.192-4.fc40 updates 47 k flex x86_64 2.6.4-16.fc40 fedora 299 k kernel-devel x86_64 6.11.5-200.fc40 updates 21 M kernel-devel-matched x86_64 6.11.5-200.fc40 updates 183 k kmodtool noarch 1.1-10.fc40 fedora 16 k libgit2 x86_64 1.7.2-4.fc40 updates 543 k libssh2 x86_64 1.11.0-4.fc40 fedora 130 k libzstd-devel x86_64 1.5.6-1.fc40 updates 52 k llhttp x86_64 9.2.1-1.fc40 updates 33 k m4 x86_64 1.4.19-9.fc40 fedora 305 k nvidia-modprobe x86_64 3:550.67-1.fc40 rpmfusion-nonfree 32 k nvidia-settings x86_64 3:550.67-1.fc40 rpmfusion-nonfree 1.6 M openssl x86_64 1:3.2.2-3.fc40 updates 1.1 M openssl-devel x86_64 1:3.2.2-3.fc40 updates 2.8 M python3-argcomplete noarch 3.5.1-1.fc40 updates 96 k python3-babel noarch 2.16.0-1.fc40 updates 6.5 M python3-click-plugins noarch 1.1.1-19.fc40 fedora 17 k python3-progressbar2 noarch 3.53.2-11.fc40 fedora 72 k python3-pygit2 x86_64 1.14.0-1.fc40 fedora 286 k python3-rpmautospec-core noarch 0.1.5-1.fc40 updates 15 k python3-typing-extensions noarch 4.12.2-2.fc40 updates 89 k python3-utils noarch 3.7.0-3.fc40 fedora 69 k rpmdevtools noarch 9.6-7.fc40 fedora 96 k time x86_64 1.9-23.fc40 fedora 47 k xorg-x11-drv-nvidia x86_64 3:550.67-1.fc40 rpmfusion-nonfree 126 M xorg-x11-drv-nvidia-kmodsrc x86_64 3:550.67-1.fc40 rpmfusion-nonfree 44 M xorg-x11-drv-nvidia-libs x86_64 3:550.67-1.fc40 rpmfusion-nonfree 59 M zlib-ng-compat-devel x86_64 2.1.7-2.fc40 updates 38 k Installing weak dependencies: python3-rpmautospec noarch 0.7.3-1.fc40 updates 74 k xorg-x11-drv-nvidia-cuda-libs x86_64 3:550.67-1.fc40 rpmfusion-nonfree 41 M xorg-x11-drv-nvidia-power x86_64 3:550.67-1.fc40 rpmfusion-nonfree 103 k Skipping packages with conflicts: (add '--best --allowerasing' to command line to force their upgrade): sdubby noarch 1.0-8.fc40 fedora 18 k sdubby noarch 1.0-11.fc40 updates 19 k Skipping packages with broken dependencies: akmod-nvidia x86_64 3:560.35.03-1.fc40 rpmfusion-nonfree-updates 42 k xorg-x11-drv-nvidia x86_64 3:560.35.03-5.fc40 rpmfusion-nonfree-updates 133 M

Transaction Summary

Install 37 Packages Skip 4 Packages

Total download size: 307 M Installed size: 776 M Is this ok [y/N]:



### Other notes and links

It is possible to delete `grubby-dummy`. In that case `akmod-nvidia` builds and seems to be functional, but gui agent doesn't work. It may not work for the same reason it stops working after the third reproduction step and may require its own issue, I don't have this figured out yet. Keep in mind that for akmod to build you must extend `/tmp/`: default 1 GB is not enough. You can use my salt state to reproduce everything after the deletion of grubby-dummy.

[nvidia-driver.sls](https://github.com/user-attachments/files/17608430/nvidia-driver.txt)

It should be possible to install older driver version, since it doesn't have this problem with dependencies. I tried, and it only works if driver was installed before and I haven't updated yet. This most likely can be solved by rolling back the kernel on a new system. Who cares? This is old version of the driver and kernel anyway, I expect them to be deprecated at some point.

Related forum posts:
- [Nvidia drivers dependencies problem; fedora39](https://forum.qubes-os.org/t/nvidia-drivers-dependencies-problem-fedora39/27979)
- [Nvidia proprietary driver installation help in 4.2](https://forum.qubes-os.org/t/nvidia-proprietary-driver-installation-help-in-4-2/29432)
RandyTheOtter commented 1 week ago

Little update, I have figured out the problems with gui daemon and deleting grubby-dummy seems to be working, but I haven't tested it that much yet.

DemiMarie commented 6 days ago

Why is something pulling in sdubby? That is for systemd-boot and neither Fedora nor Qubes OS uses that by default.

marmarek commented 6 days ago

Yes, this is very interesting question. Generally, grubby (and sdubby probably too) is rather broken concept of maintaining bootloader config, and caused several issues in the past. I'm not sure about sdubby, but grubby tries to parse generated grub.cfg and edit it to add new entries based on existing ones (contrary to the huge comment at the top to not edit it). Some upstream discussion: https://bugzilla.redhat.com/show_bug.cgi?id=1287854 </rant>

That's why we have grubby-dummy - to avoid pulling in real grubby package even if something would try. Fix for this ticket should include checking what name is pulled in via deps and adding appropriate Provides: to the grubby-dummy package (so real grubby/sdubby is no longer pulled in). I wouldn't expect anything to break, all the places using grubby I've seen do have a fallback to a proper config generator (grub2-mkconfig).

RandyTheOtter commented 5 days ago

@DemiMarie both akmods (not akmods-nvidia), and xorg-x11-drv-nvidia depend on grubby, and nothing except for systemd-udev depend on sdubby directly.

 $ repoquery -q --installed --whatrequires sdubby
systemd-udev-0:255.13-1.fc40.x86_64
 $ repoquery -q --installed --whatrequires grubby
akmods-0:0.5.8-8.fc40.noarch
systemd-udev-0:255.13-1.fc40.x86_64
xorg-x11-drv-nvidia-3:560.35.03-5.fc40.x86_64

Don't know why though.