NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
18.06k stars 14.08k forks source link

KDE/SDDM fails to start on NVIDIA proprietary driver v560.35.03 + Kernel 6.11.0 (Could not initialize egl/EGL not available) #344167

Open opl- opened 1 month ago

opl- commented 1 month ago

Updating NixOS to nixpkgs c04d5652cfa9742b1d519688f65d1bbccea9eb7e results in SDDM crashing on startup with "Could not initialize egl" and "EGL not available" errors logged in the journal.

Additional context

nixpkgs: c04d5652cfa9742b1d519688f65d1bbccea9eb7e Kernel: v6.11.0 NVIDIA driver: v560.35.03 (crashes with both open and non-open kernel module) KDE: v6.1.5 (wayland) dGPU: NVIDIA RTX 3070 Ti Laptop

Previous working generation was running nixpkgs c374d94f1536013ca8e92341b540eba4c22f9c62 (Linux kernel v6.10.6 with the beta v560.31.02 NVIDIA driver).

# configuration.nix
boot.kernelPackages = pkgs.linuxPackages_latest;
services.xserver.videoDrivers = [ "nvidia" ];
hardware.nvidia.package = config.boot.kernelPackages.nvidiaPackages.beta;
hardware.nvidia.modesetting.enable = true;
hardware.nvidia.open = true; # either crashes
hardware.nvidia.powerManagement.enable = true;
hardware.nvidia.powerManagement.finegrained = false;
hardware.nvidia.prime.sync.enable = true;
sudo journalctl -b -1 | grep sddm Nearly identical with open and non-open kernel module, the only difference being the `HDMI-A-1` display being named unknown. ```console sddm[1664]: Greeter session started successfully sddm-helper-start-wayland[1874]: Starting Wayland process "/nix/store/yxy38krm4jpq9f4xbb3i31bszyp5dvv3-kwin-6.1.5/bin/kwin_wayland --no-global-shortcuts --no-kactivities --no-lockscreen --locale1" "sddm" sddm-helper-start-wayland[1874]: started succesfully "/nix/store/yxy38krm4jpq9f4xbb3i31bszyp5dvv3-kwin-6.1.5/bin/kwin_wayland --no-global-shortcuts --no-kactivities --no-lockscreen --locale1" sddm-helper-start-wayland[1874]: "No backend specified, automatically choosing drm\n" sddm-helper-start-wayland[1874]: Directory "/run/user/175" has changed, checking for Wayland socket sddm-helper-start-wayland[1874]: Found Wayland socket "/run/user/175/wayland-0" sddm-helper-start-wayland[1874]: "Accepting client connections on sockets: QList(\"wayland-0\")\n" sddm-greeter-qt6[1893]: High-DPI autoscaling Enabled sddm-helper-start-wayland[1874]: "\"applications.menu\" not found in QList(\"/run/current-system/sw/etc/xdg/menus\")\n" sddm-helper-start-wayland[1874]: "kwin_scene_opengl: Creating the OpenGL rendering failed: \"Could not initialize egl\"\n" sddm-greeter-qt6[1893]: Reading from "/nix/store/7j5hgwyngfx5vpdkyh29ar8bzg43xdip-desktops/share/wayland-sessions/plasma.desktop" sddm-greeter-qt6[1893]: Reading from "/nix/store/7j5hgwyngfx5vpdkyh29ar8bzg43xdip-desktops/share/xsessions/plasmax11.desktop" sddm-greeter-qt6[1893]: Loading theme configuration from "/run/current-system/sw/share/sddm/themes/breeze/theme.conf" sddm-greeter-qt6[1893]: Connected to the daemon. sddm[1664]: Message received from greeter: Connect sddm-greeter-qt6[1893]: EGL not available sddm-greeter-qt6[1893]: Loading file:///run/current-system/sw/share/sddm/themes/breeze/Main.qml... sddm-greeter-qt6[1893]: failed to acquire GL context to resolve capabilities, using defaults.. sddm-greeter-qt6[1893]: Adding view for "HDMI-A-1" QRect(800,0 2048x1152) sddm-greeter-qt6[1893]: Loading file:///run/current-system/sw/share/sddm/themes/breeze/Main.qml... sddm-greeter-qt6[1893]: failed to acquire GL context to resolve capabilities, using defaults.. sddm-greeter-qt6[1893]: Adding view for "eDP-2" QRect(2848,0 1707x1067) sddm-greeter-qt6[1893]: Loading file:///run/current-system/sw/share/sddm/themes/breeze/Main.qml... sddm-greeter-qt6[1893]: failed to acquire GL context to resolve capabilities, using defaults.. sddm-greeter-qt6[1893]: Adding view for "Unknown-1" QRect(0,0 800x600) sddm-greeter-qt6[1893]: Message received from daemon: Capabilities sddm-greeter-qt6[1893]: Message received from daemon: HostName sddm-greeter-qt6[1893]: QRhiGles2: Failed to create temporary context sddm-greeter-qt6[1893]: QRhiGles2: Failed to create context sddm-greeter-qt6[1893]: Failed to create RHI (backend 2) sddm-greeter-qt6[1893]: Failed to initialize graphics backend for OpenGL. systemd-coredump[2002]: Process 1893 (sddm-greeter-qt) of user 175 terminated abnormally with signal 6/ABRT, processing... systemd-coredump[2003]: Process 1893 (sddm-greeter-qt) of user 175 dumped core. Module sddm-greeter-qt6 without build-id. #20 0x00000000004125b4 main (sddm-greeter-qt6 + 0x125b4) #23 0x0000000000412a25 _start (sddm-greeter-qt6 + 0x12a25) sddm-helper-start-wayland[1874]: wayland greeter finished 6 QProcess::CrashExit sddm-helper-start-wayland[1874]: quitting helper-start-wayland sddm-helper-start-wayland[1874]: Stopping... "/nix/store/yxy38krm4jpq9f4xbb3i31bszyp5dvv3-kwin-6.1.5/bin/kwin_wayland" sddm-helper-start-wayland[1874]: wayland compositor finished 15 QProcess::NormalExit sddm-helper-start-wayland[1874]: quitting helper-start-wayland sddm-helper[1764]: [PAM] Closing session sddm-helper[1764]: pam_systemd(sddm-greeter:session): New sd-bus connection (system-bus-pam-systemd-1764) opened. drkonqi-coredump-processor[2004]: "/nix/store/shlcpqycfm5ni30aigipjfig8lxg112w-sddm-unwrapped-0.21.0/bin/sddm-greeter-qt6" 1893 "/var/lib/systemd/coredump/core.sddm-greeter-qt.175.8d57ab7e4618474cabfaa73d494e5ada.1893.1727162623000000.zst" drkonqi-coredump-launcher[2034]: Unable to find file for pid 1893 expected at "kcrash-metadata/sddm-greeter-qt6.8d57ab7e4618474cabfaa73d494e5ada.1893.ini" sddm-helper[1764]: [PAM] Ended. sddm[1664]: Auth: sddm-helper exited successfully sddm[1664]: Greeter stopped. SDDM::Auth::HELPER_SUCCESS (sd-pam)[1790]: pam_unix(systemd-user:session): session closed for user sddm ```

The simple-framebuffer section is not present in the drmdevice output when using my previous system generation.

nix shell nixpkgs#libdrm^bin -c drmdevice ```console --- Checking the number of DRM device available --- --- Devices reported 3 --- --- Retrieving devices information (PCI device revision is ignored) --- device[0] +-> available_nodes 0x01 +-> nodes | +-> nodes[0] /dev/dri/card0 +-> bustype 0002 | +-> platform | +-> fullname simple-framebuffer +-> deviceinfo +-> platform +-> compatible simple-framebuffer --- Opening device node /dev/dri/card0 --- --- Retrieving device info, for node /dev/dri/card0 --- device[0] +-> available_nodes 0x01 +-> nodes | +-> nodes[0] /dev/dri/card0 +-> bustype 0002 | +-> platform | +-> fullname simple-framebuffer +-> deviceinfo +-> platform +-> compatible simple-framebuffer device[1] +-> available_nodes 0x05 +-> nodes | +-> nodes[0] /dev/dri/card2 | +-> nodes[2] /dev/dri/renderD129 +-> bustype 0000 | +-> pci | +-> domain 0000 | +-> bus 01 | +-> dev 00 | +-> func 0 +-> deviceinfo +-> pci +-> vendor_id 10de +-> device_id 24a0 +-> subvendor_id 1043 +-> subdevice_id 1a8c +-> revision_id IGNORED --- Opening device node /dev/dri/card2 --- --- Retrieving device info, for node /dev/dri/card2 --- device[1] +-> available_nodes 0x05 +-> nodes | +-> nodes[0] /dev/dri/card2 | +-> nodes[2] /dev/dri/renderD129 +-> bustype 0000 | +-> pci | +-> domain 0000 | +-> bus 01 | +-> dev 00 | +-> func 0 +-> deviceinfo +-> pci +-> vendor_id 10de +-> device_id 24a0 +-> subvendor_id 1043 +-> subdevice_id 1a8c +-> revision_id a1 --- Opening device node /dev/dri/renderD129 --- --- Retrieving device info, for node /dev/dri/renderD129 --- device[1] +-> available_nodes 0x05 +-> nodes | +-> nodes[0] /dev/dri/card2 | +-> nodes[2] /dev/dri/renderD129 +-> bustype 0000 | +-> pci | +-> domain 0000 | +-> bus 01 | +-> dev 00 | +-> func 0 +-> deviceinfo +-> pci +-> vendor_id 10de +-> device_id 24a0 +-> subvendor_id 1043 +-> subdevice_id 1a8c +-> revision_id a1 device[2] +-> available_nodes 0x05 +-> nodes | +-> nodes[0] /dev/dri/card1 | +-> nodes[2] /dev/dri/renderD128 +-> bustype 0000 | +-> pci | +-> domain 0000 | +-> bus 00 | +-> dev 02 | +-> func 0 +-> deviceinfo +-> pci +-> vendor_id 8086 +-> device_id 46a6 +-> subvendor_id 1043 +-> subdevice_id 1a8c +-> revision_id IGNORED --- Opening device node /dev/dri/card1 --- --- Retrieving device info, for node /dev/dri/card1 --- device[2] +-> available_nodes 0x05 +-> nodes | +-> nodes[0] /dev/dri/card1 | +-> nodes[2] /dev/dri/renderD128 +-> bustype 0000 | +-> pci | +-> domain 0000 | +-> bus 00 | +-> dev 02 | +-> func 0 +-> deviceinfo +-> pci +-> vendor_id 8086 +-> device_id 46a6 +-> subvendor_id 1043 +-> subdevice_id 1a8c +-> revision_id 0c --- Opening device node /dev/dri/renderD128 --- --- Retrieving device info, for node /dev/dri/renderD128 --- device[2] +-> available_nodes 0x05 +-> nodes | +-> nodes[0] /dev/dri/card1 | +-> nodes[2] /dev/dri/renderD128 +-> bustype 0000 | +-> pci | +-> domain 0000 | +-> bus 00 | +-> dev 02 | +-> func 0 +-> deviceinfo +-> pci +-> vendor_id 8086 +-> device_id 46a6 +-> subvendor_id 1043 +-> subdevice_id 1a8c +-> revision_id 0c ```

Notify maintainers

@Kiskae @edwtjo

Metadata

Please run nix-shell -p nix-info --run "nix-info -m" and paste the result.

 - system: `"x86_64-linux"`
 - host os: `Linux 6.11.0, NixOS, 24.11 (Vicuna), 24.11.20240919.c04d565`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.18.5`
 - nixpkgs: `/nix/store/hiasfhl8f5yy88hcfbr3s8s4bm63wsjw-source`

Add a :+1: reaction to issues you find important.

opl- commented 1 month ago

Linking issue #343774 as it might be related, but the errors in the logs given there differ from mine.

This comment on that issue links to an Arch forum thread, where someone explains the issue is caused by "simpledrm" not being automatically disabled by the NVIDIA driver due to header changes in kernel v6.11.0.

SDDM and KDE start correctly when testing the suggested workaround by adding initcall_blacklist=simpledrm_platform_driver_init to kernel parameters with the open kernel modules, but it causes console TTYs to freeze almost immediately during boot, eternally showing only the first two lines of boot logs. I think KDE crashed twice without the open kernel module.

To quickly test if this will fix the issue, I selected the NixOS generation with kernel v6.11.0 in grub, pressed [e], then added initcall_blacklist=simpledrm_platform_driver_init at the end of the text box at the bottom, separated from the rest by a space, and pressed [enter] to boot.

opl- commented 1 month ago

There's already a PR to the NVIDIA open-gpu-kernel-modules repository which adds support for the renamed kernel header files.

I tried to test it with the following NixOS configuration change after merging the PR into the v560.35.03 kernel module. I think this is technically incorrect as I'm not globally overriding the linuxPackages.nvidia_x11 package, but the Nix documentation again failed to assist me in doing that.

As a result SDDM was no longer crashing, but wasn't rendering correctly either, staying as a black screen. The only reason I realized it's running is because it briefly flashed (at the wrong resolution) when I switched to a console TTY.

After blindly entering my password into the black SDDM, KDE crashed with the errors from #343774 appearing in it.

I guess I'm finally experiencing the reasons why people always say not to run the latest kernel with NVIDIA proprietary drivers.

{ config, pkgs }: {
  # This does not work. Kind of.
  hardware.nvidia.open = true;
  hardware.nvidia.package = config.boot.kernelPackages.nvidiaPackages.beta.overrideAttrs {
    open = config.boot.kernelPackages.nvidiaPackages.beta.open.overrideAttrs {
      src = pkgs.fetchFromGitHub {
        owner = "opl-";
        repo = "open-gpu-kernel-modules";
        rev = "main";
        hash = "sha256-SzbXewSU1Mn8uFtLlDGiJKJSEkXBoTRpLlFzlvZiliU=";
      };
    };
  };
}
opl- commented 1 month ago

And indeed, Kernel v6.10.11 ({ boot.kernelPackages = pkgs.linuxPackages_6_10; }) works fine with NVIDIA proprietary v560.35.03 + open kernel module.

VeilSilence commented 1 month ago

Nvidia issue. Stay at 6.10 until new driver release.

djmaze commented 1 week ago

With kernel 6.11.5 (boot.kernelPackages = pkgs.linuxPackages_latest;), the latest beta nvidia driver seems to work for me. Running 24.05 stable, I did this:

    package = config.boot.kernelPackages.nvidiaPackages.mkDriver {
      version = "565.57.01";
      sha256_64bit = "sha256-buvpTlheOF6IBPWnQVLfQUiHv4GcwhvZW3Ks0PsYLHo=";
      sha256_aarch64 = "sha256-aDVc3sNTG4O3y+vKW87mw+i9AqXCY29GVqEIUlsvYfE=";
      openSha256 = "sha256-/tM3n9huz1MTE6KKtTCBglBMBGGL/GOHi5ZSUag4zXA=";
      settingsSha256 = "sha256-H7uEe34LdmUFcMcS6bz7sbpYhg9zPCb/5AmZZFTx1QA=";
      persistencedSha256 = "sha256-hdszsACWNqkCh8G4VBNitDT85gk9gJe1BlQ8LdrYIkg=";
    };

Need to disable nvidia-settings though because of a compilation error:

    # The nvidia-settings build is currently broken due to a missing
    # vulkan header; re-enable whenever?
    # 0384602eac8bc57add3227688ec242667df3ffe3the hits stable.
    nvidiaSettings = false;

Also, booting the system with an external monitor attached makes the system freeze instantly when loading the kernel on my device (ProArt PX13), so for now I disconnect it before booting the machine.

Murazaki commented 1 week ago

Could not stay on 6.10 as I can´t rebuild with it "because it reached end of life upstream". And KDE still crashing after a few minutes and refusing to reboot ? Trying to switch to beta (565.57.01) like @djmaze.

no nvidia-settings build issue for me.

Edit: getting a nvidia driver mismatch issue...

Edit: fixed by deactivating boot nvidia modules. (or you can use nvidia_x11_beta)

# boot.extraModulePackages = [ config.boot.kernelPackages.nvidia_x11 ];
boot.extraModulePackages = [ config.boot.kernelPackages.nvidia_x11_beta ];
Murazaki commented 1 week ago

Confirming using nvidia 565.57.01 is much more stable than previous versions after several hours of running. Fixed SDDM not booting. Fixed KDE crash. No issues with Electron and Firefox (might be due to Firefox update to 131 though). Games building shaders and running properly on high perfs.

mksafavi commented 1 week ago

Confirming using nvidia 565.57.01 is much more stable than previous versions after several hours of running.

Great 👍 Is that with kernel >6.11 ?

Could not stay on 6.10 as I can´t rebuild with it "because it reached end of life upstream".

I'm still on 6.10 on my nvidia machine. I didn't notice this issue. I switched to 6.10 by this:

boot.kernelPackages = pkgs.linuxPackages_6_10;
Murazaki commented 1 week ago

Great 👍 Is that with kernel >6.11 ?

Yes, this is on latest :

$ uname -r
6.11.5
NovaViper commented 6 days ago

I can confirm that v565.57.01 works with 6.11.5-xanmod1!

RedEtherbloom commented 23 hours ago

We can also confirm this on Kernel 6.11.5. Troubles first began on 6.11, went away when we had to upgrade to 6.11. Switching to the drivers beta branch fixed them for us again.