NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
18.1k stars 14.14k forks source link

NVIDIA proprietary drivers fail on Wayland with the latest Linux kernel in NixOS unstable (24.11) #343774

Closed ccicnce113424 closed 1 month ago

ccicnce113424 commented 1 month ago

Describe the bug

I'm encountering an issue with the NVIDIA proprietary drivers when using the latest Linux kernel on NixOS 24.11 (Vicuna) (version: 24.11.20240919.c04d565). When configured to use the latest kernel (boot.kernelPackages = pkgs.linuxPackages_latest;), SDDM fails to start on Wayland and results in a black screen. Occasionally, a frozen cursor appears at a low resolution on top of the black screen, and moving the mouse has no effect.

Additionally, when booting with the latest kernel, the system remains stuck on the Plymouth boot screen for a noticeably longer time compared to the stable kernel.

A similar issue has been reported in #323396.

Steps To Reproduce

Steps to reproduce the behavior:

  1. Add the following NixOS configuration:
    boot.kernelPackages = pkgs.linuxPackages_latest;
  2. Rebuild and restart the system.
  3. Attempt to run SDDM on Wayland or launch a KDE session on Wayland from X11.

Expected behavior

SDDM and KDE should run on Wayland with the latest Linux kernel and the NVIDIA proprietary driver, without black screen or frozen cursor issues.

Screenshots

N/A

Additional context

I have tested both the stable and beta versions of the NVIDIA drivers, as well as using both open and proprietary kernel modules, but the problem persists in all cases. Only reverting to the stable kernel resolves the issue. This issue occurs every time I use the latest kernel.

The multi-user value in the metadata is set to no because I had to run the nix-info command from the command-line interface.

Notify maintainers

@Kiskae
@edwtjo

Metadata

 - system: "x86_64-linux"
 - host os: Linux 6.11.0, NixOS, 24.11 (Vicuna), 24.11.20240919.c04d565
 - multi-user?: no
 - sandbox: yes
 - version: nix-env (Nix) 2.18.5
 - nixpkgs: /nix/store/hiasfhl8f5yy88hcfbr3s8s4bm63wsjw-source

Add a :+1: reaction to issues you find important.

Kiskae commented 1 month ago

Could you use journalctl -b-<no> to check your system logs of previous boots to see if there was something related to sddm in there? Unless there is an error logged somewhere I'm not sure where to start looking.

ccicnce113424 commented 1 month ago
9月 22 22:49:03 ccic-desktop sddm-helper-start-wayland[1619]: Starting Wayland process "/nix/store/yxy38krm4jpq9f4xbb3i31bszyp5dvv3-kwin-6.1.5/bin/kwin_wayland --no-global-shortcuts --no-kactivities --no-lockscreen --locale1" "sddm"
9月 22 22:49:03 ccic-desktop sddm-helper-start-wayland[1619]: started succesfully "/nix/store/yxy38krm4jpq9f4xbb3i31bszyp5dvv3-kwin-6.1.5/bin/kwin_wayland --no-global-shortcuts --no-kactivities --no-lockscreen --locale1"
9月 22 22:49:03 ccic-desktop sddm-helper-start-wayland[1619]: "No backend specified, automatically choosing drm\n"
9月 22 22:49:03 ccic-desktop sddm-helper-start-wayland[1619]: Directory "/run/user/175" has changed, checking for Wayland socket
9月 22 22:49:03 ccic-desktop sddm-helper-start-wayland[1619]: Found Wayland socket "/run/user/175/wayland-0"
9月 22 22:49:03 ccic-desktop sddm-helper-start-wayland[1619]: "Accepting client connections on sockets: QList(\"wayland-0\")\n"
9月 22 22:49:03 ccic-desktop sddm-helper-start-wayland[1619]: "kwin_scene_opengl: No render nodes have been found, falling back to primary node\n"
9月 22 22:49:04 ccic-desktop sddm-helper-start-wayland[1619]: "kwin_scene_opengl: 0x502: GL_INVALID_OPERATION error generated. <image> and <target> are incompatible\n"
9月 22 22:49:04 ccic-desktop sddm-helper-start-wayland[1619]: "kwin_wayland_drm: Failed to create framebuffer: Invalid argument\n"
9月 22 22:49:05 ccic-desktop sddm-helper-start-wayland[1619]: "kwin_wayland_drm: Failed to create framebuffer: Invalid argument\n"
9月 22 22:49:08 ccic-desktop sddm-helper-start-wayland[1619]: "kwin_wayland_drm: Presentation failed! Invalid argument\n"
9月 22 22:49:09 ccic-desktop sddm-helper-start-wayland[1619]: "kwin_core: Applying output config failed!\n"
9月 22 22:49:09 ccic-desktop sddm-helper-start-wayland[1619]: "kwin_wayland_drm: Failed to create framebuffer: Invalid argument\n"
9月 22 22:49:09 ccic-desktop sddm-helper-start-wayland[1619]: "kwin_wayland_drm: Presentation failed! Permission denied\n"
inclyc commented 1 month ago

Same issue here, I found workaround trick here.

Kiskae commented 1 month ago

Same issue here, I found workaround trick here.

That specifically appears to be about the GL_FRAMEBUFFER_INCOMPLETE_MISSING_ATTACHMENT + GL_FRAMEBUFFER_INCOMPLETE_ATTACHMENT errors.

The OP issue meanwhile looks like the wayland server and the compositor disagreeing about which drm node is the render node.

If the OP still has the issue, I've got some more diagnostics I could use:

  1. Anything in the journal related to OpenGL, looking at similar issues it looks like kwin_scene_opengl prints a lot of information about the driver it is using on errors.
  2. While experiencing the issue, use ctrl+alt+f2 to go to the virtual console, log in and run nix shell nixpkgs#libdrm^bin -c drmdevice to get some information about the current drm nodes 2a. at this point you might want to try sudo systemctl restart graphical.target to manually restart the GUI and see if it starts working.
ccicnce113424 commented 1 month ago

Same issue here, I found workaround trick here.

I'd like to try this trick, but I am a beginner with NixOS, and I don't know how to apply patches to the open kernel module.

inclyc commented 1 month ago

Hi @ccicnce113424,

I'd like to try this trick, but I am a beginner with NixOS, and I don't know how to apply patches to the open kernel module.

I added the kernel params according to pbo's reply to my configuration.

using initcall_blacklist=simpledrm_platform_driver_init : simpledrm isnt loaded, tty is black with [drm] User-defined mode not supported: "1920x1080" , but if I enter login, password and launch Hyprland blindy it works.

boot.kernelParams = [
  "initcall_blacklist=simpledrm_platform_driver_init"
]

And I can confirm that the tty is black (sad) but the desktop environment (kde-wayland for me) works.

ccicnce113424 commented 1 month ago

Hi @ccicnce113424,

I'd like to try this trick, but I am a beginner with NixOS, and I don't know how to apply patches to the open kernel module.

I added the kernel params according to pbo's reply to my configuration.

using initcall_blacklist=simpledrm_platform_driver_init : simpledrm isnt loaded, tty is black with [drm] User-defined mode not supported: "1920x1080" , but if I enter login, password and launch Hyprland blindy it works.

boot.kernelParams = [
  "initcall_blacklist=simpledrm_platform_driver_init"
]

And I can confirm that the tty is black (sad) but the desktop environment (kde-wayland for me) works.

I tried it and the result was exactly the same. So this should be an error in the NVIDIA kernel module, unrelated to SDDM, KDE, and KWin.

ccicnce113424 commented 1 month ago

Same issue here, I found workaround trick here.

I'd like to try this trick, but I am a beginner with NixOS, and I don't know how to apply patches to the open kernel module.

I created a patch using the following commands from the pull request submitted by leigh123linux:

git clone https://github.com/leigh123linux/open-gpu-kernel-modules.git -b 611_drm_change 
cd open-gpu-kernel-modules 
git diff HEAD^1 > kernel-modules.patch

Then I applied the patch with the following settings:

hardware.nvidia.package = config.boot.kernelPackages.nvidiaPackages.beta.overrideAttrs {
  open = config.boot.kernelPackages.nvidiaPackages.beta.open.overrideAttrs {
    patches = [ ./kernel-modules.patch ];
  };
};

This change did not have any effect; SDDM still does not work properly.

blakeashleyjr commented 1 month ago

For those looking for a working system until this is fixed without going back too far in kernel versions, pinning to kernel version 6.10.11 resolves the issue for me:

boot.kernelPackages = pkgs.linuxPackagesFor (pkgs.linux_5_10.override {
    argsOverride = rec {
      src = pkgs.fetchurl {
            url = "mirror://kernel/linux/kernel/v6.x/linux-${version}.tar.xz";
            sha256 = "+02gRvjBhRWfRTfe2IejCsxp2RxVWg/3+rxFIPWaMJY=";
      };
      version = "6.10.11";
      modDirVersion = "6.10.11";
      };
  });
Binary-Eater commented 1 month ago

For Linux kernel 6.11, we released a fix in our production branch release 550.120, which uses drm_fbdev_ttm_setup in place of drm_fbdev_generic_setup for kernels 6.11 and above. A future release in the new feature branch will contain this fix as well but we do not have a plan to make a release for this branch in the near future. For reference, please feel free to extract production branch release 550.120 and apply the changes to nvidia-drm as you see fit.

Our forum post detailing this: https://forums.developer.nvidia.com/t/drm-fbdev-wayland-presentation-support-with-linux-kernel-6-11-and-above/307920.

Kiskae commented 1 month ago

^ when that lands and you still experience the issue, make sure you're using the open driver. The proprietary driver does not get the patch.

BenA0 commented 1 month ago

^ when that lands and you still experience the issue, make sure you're using the open driver. The proprietary driver does not get the patch.

I assume this means Pascal (1000 series) and before aren't supported by this patch, since the open modules only support Turing and above, and until nvidia address it in the next major release (565?) are stuck on 6.10 kernels.

fpletz commented 1 month ago

You can maybe try to get the patch to apply for the proprietary modules as @Kiskae mentioned in the PR but I didn't want to invest more time. PRs are welcome though. Until then we have to wait for Nvidia to fix it. Note that the production version has been fixed by Nvidia and has been merged into nixpkgs in #344524.