NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
17.88k stars 13.94k forks source link

mesa: using a newer mesa version for drivers crashes DEs under wayland #223729

Closed Atemu closed 1 year ago

Atemu commented 1 year ago

Describe the bug

A clear and concise description of what the bug is.

Changing mesa version via hardware.opengl.mesaPackage apparently causes modern DEs and login managers to crash:

https://github.com/NixOS/nixpkgs/issues/223458 https://github.com/NixOS/nixpkgs/issues/223331 https://github.com/NixOS/nixpkgs/issues/223535

https://github.com/NixOS/nixpkgs/pull/223530

Expected behavior

A clear and concise description of what you expected to happen.

It should "just work".

Additional context

Add any other context about the problem here.

Perhaps this is due to some direct dependance on mesa's libGL. If that's the case, we could make them depend on the vendor-agnostic libGL instead.

If it's due to libgbm, we might have a problem on our hands as that essentially makes hardware.opengl.mesaPackage useless on modern software stacks.

Notify maintainers

@R-VdP @K900 @vcunat

Metadata

Please run nix-shell -p nix-info --run "nix-info -m" and paste the result.

[user@system:~]$ nix-shell -p nix-info --run "nix-info -m"
output here
pr0ton11 commented 1 year ago

Hi Could this be prioritized? I can not boot my workstation with mesa 23 anymore. I can help with logs if you need anything

Atemu commented 1 year ago

@pr0ton11 wdym by "anymore"? Did it work before?

SuperSandro2000 commented 1 year ago

Could this be prioritized? I can not boot my workstation with mesa 23 anymore.

The revert that made mesa_23 the default is already reverted and soon in nixos-unstable.

vcunat commented 1 year ago

I believe it was in all channels already. Historical data are harder to show, but it certainly is at this moment: https://nixpk.gs/pr-tracker.html?pr=223530

And BTW, you can switch the default around easily in your config, e.g. in case you want to temporarily use something else than the latest channels.

ElvishJerricco commented 1 year ago

Just stream-of-consciousness'ing our findings in a Matrix channel with @K900.

Symptoms: When GDM starts, the display goes to sleep. I can switch VTs, but once I switch back to the GDM VT, the existing image remains and nothing happens. It is responsive though. I can pretend as though it's visible and use the keyboard to log in. When I do, the image on the screen remains frozen, but the sessions is successfully running invisibly.

Gnome Shell seems to be saying this:

Apr 02 13:30:46 jace .gnome-shell-wr[3962]: Running GNOME Shell (using mutter 43.3) as a Wayland display server
Apr 02 13:30:46 jace .gnome-shell-wr[3962]: Device '/dev/dri/card0' prefers shadow buffer
Apr 02 13:30:46 jace .gnome-shell-wr[3962]: Added device '/dev/dri/card0' (amdgpu) using atomic mode setting.
Apr 02 13:30:46 jace .gnome-shell-wr[3962]: Device '/dev/dri/card1' prefers shadow buffer
Apr 02 13:30:46 jace .gnome-shell-wr[3962]: Added device '/dev/dri/card1' (amdgpu) using atomic mode setting.
Apr 02 13:30:46 jace .gnome-shell-wr[3962]: Created gbm renderer for '/dev/dri/card0'
Apr 02 13:30:46 jace .gnome-shell-wr[3962]: Created gbm renderer for '/dev/dri/card1'
Apr 02 13:30:46 jace .gnome-shell-wr[3962]: Boot VGA GPU /dev/dri/card0 selected as primary
Apr 02 13:30:46 jace .gnome-shell-wr[3962]: Failed to allocate onscreen framebuffer for /dev/dri/card0: EGL failed to allocate resources for the requested operation.

Setting services.xserver.displayManager.gdm.wayland = false; gets me to a working desktop though.

We found this.

Tried this from here to fix it. No luck. Same behavior.

Sway worked fine.

This might be related.

We wanted to try with MUTTER_DEBUG_SEND_KMS_MODIFIERS=0, but couldn't figure out how to get the environment variable passed into the gnome-shell process. systemd.services.display-manager.environment didn't do it, services.xserver.displayManager.setupCommands didn't do it, and services.xserver.displayManager.job.environment didn't do it.

Rebuilding the world with an overlay that sets mesa = final.mesa_23; does result in a working desktop though, so there must be some incompatibility between mesa 22 libs and mesa 23 drivers.

pr0ton11 commented 1 year ago

@pr0ton11 wdym by "anymore"? Did it work before?

No, it didn't. I am using the workaround of setting mesa to 22 atm, awaiting a fix for mesa 23 (I would profit from this because I use a new AMD card).

K900 commented 1 year ago

I think we'll just have to upstream with this unless someone with the requisite knowledge can get their hands on a device that's affected and debug locally...

K900 commented 1 year ago

Also, just for statistics' sake, if you are affected, can you please post your specs (specifically all GPUs, even if you don't use some of them), kernel version and which compositors are affected (Sway seems to be a good test to see if it's all of them or only some).

r-vdp commented 1 year ago

Also, just for statistics' sake, if you are affected, can you please post your specs (specifically all GPUs, even if you don't use some of them), kernel version and which compositors are affected (Sway seems to be a good test to see if it's all of them or only some).

I was affected, I run an intel core i7 1260P with integrated graphics, no other GPU on the system. Latest kernel from unstable, so 6.2.8. Standard Gnome DE with wayland.

Atemu commented 1 year ago

Also very important is wayland/Xorg.

CobaltCause commented 1 year ago

Plasma Wayland was broken for me (didn't try sddm, I use tuigreet). River and Hyprland both work properly. Didn't try anything else.

Kernel: 6.2.7 iGPU: 7950X dGPU: 7900 XTX

asininemonkey commented 1 year ago

For what it's worth you can simulate this breakage in a Parallels VM on macOS. I'm running unstable with mesa_22 in both my VM and my AMD 5900X CPU 6800XT GPU based home system:

Kernel: 6.2.8 GNOME: 43.3 Wayland: 1.21.0

Parallels Driver Details:

VkPhysicalDeviceDriverProperties:
---------------------------------
    driverID        = DRIVER_ID_MESA_LLVMPIPE
    driverName      = llvmpipe
    driverInfo      = Mesa 22.3.7 (LLVM 15.0.7)
    conformanceVersion:
        major    = 1
        minor    = 3
        subminor = 1
        patch    = 1
baduhai commented 1 year ago

If I run anything other than hardware.opengl.mesaPackage = pkgs.mesa_22 Plasma Wayland doesn't work, I just get a black screen. This is reproducible both on my Ryzen desktop with an RX 6700X GPU, and my Intel laptop using the i5-10210U iGPU.

Atemu commented 1 year ago

Does Xorg work?

K900 commented 1 year ago

Can someone test https://github.com/NixOS/nixpkgs/pull/225192 ?

ElvishJerricco commented 1 year ago

@K900 I just tested it. I still get the same behavior.

K900 commented 1 year ago

So it looks like upstream just flat out doesn't support what we're doing anymore: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20069

I have a cursed idea though...

vcunat commented 1 year ago

Let's close this. The mesaPackage option was reverted in PR #225325.

There will most likely be some related followups. I even fear that at some point the mismatch between libgbm and drivers might be an issue (as libgbm is pure now and drivers from the OS, by default).

devmattrick commented 1 year ago

Is there an alternative/ workaround for those of us who need more recent Mesa drivers? RDNA 3 cards don't run the best on Mesa 22 in my experience, but 23 ran fine (well, until this issue of course). The impression that I'm getting is that there's not really anything I can do to get the Mesa 23 running with most DEs?

ElvishJerricco commented 1 year ago

@devmattrick No, there isn't really a workaround, because the issue is that mesa upstream doesn't support mixing different versions anymore. i.e. There will be bugs if you try to mix 22 and 23 like this. The only thing you can really do is add an overlay causing all of nixpkgs to use 23, which requires you to recompile a ton of stuff.

You might be able to get away with system.replaceRuntimeDependencies, but I don't think anyone's tested that.

vcunat commented 1 year ago

Speaking of 23 in particular, you should be able to run the staging-next branch right now. (It has basically all binaries and probably will merge soon to unstable/master anyway.)

pr0ton11 commented 1 year ago

@vcunat Could you add another comment here when it's merged, so I get pinged?

Atemu commented 1 year ago

@pr0ton11 https://nixpk.gs/pr-tracker.html?pr=223238

vcunat commented 1 year ago

In this case you can also subscribe to #224806 (e.g. click customize and subscribe just to closing event)

nixos-discourse commented 1 year ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/amd-rx-7700-xt-not-being-detected-properly/33683/5