NVIDIA / egl-wayland

The EGLStream-based Wayland external platform
MIT License
293 stars 47 forks source link

[BUG] Ver 1.1.8-1 breaks plasma wayland #40

Closed nomantis closed 2 years ago

nomantis commented 3 years ago

The bug

After upgrading from 1.1.7-1 logging in through sddm with plasma Wayland session selected, the desktop is fully missing. The cursor is visible and can be moved, and moving it to the top-left corner displays the blue desktop switching indicator, but no other visual elements are present and no applications can be started.

Reproducing the bug

OS - Archlinux

Install SDDM, plasma-desktop and nvidia-dkms. Downgrade egl-wayland to ver1.1.7-1 and reboot. This time the desktop works. Upgrade egl-wayland to ver1.1.8-1, reboot and this time the above described will happen.

Haven't checked how to debug this yet, any help would be appreciated!

erik-kz commented 3 years ago

Acknowledged, I noticed this too. It looks like the buffer release thread added in this commit https://github.com/NVIDIA/egl-wayland/commit/e95bba3e09664e48749c3b036a98336224d5bc1b is interfering with Qt's event loop, which can cause QtQuick applications to not work properly (plasma makes heavy use of QtQuick).

The issue may need to be addressed on the Qt side, I have an upstream patch prepared that I plan to post shortly. I'll update with any progress.

nomantis commented 3 years ago

Thanks that would be great! Tell me if I can help with anything.

erik-kz commented 3 years ago

Let's see how this is received https://codereview.qt-project.org/c/qt/qtwayland/+/373473

erik-kz commented 3 years ago

If you feel like it, you could try building qtwayland with that patch and see if it resolves your issue. It does appear to work for me.

arenekosreal commented 3 years ago

If you feel like it, you could try building qtwayland with that patch and see if it resolves your issue. It does appear to work for me.

This problem still exists even rebuild qt5-wayland with your patch. But wayland-session log changes from

error in client communication (pid 1342)
[destroyed object]: error 7: importing the supplied dmabufs failed

to

ProtocolException thrown:Failed to write all data

Full wayland-session log files generated by SDDM before and after patch: https://gist.github.com/zhanghua000/084d3f3bca2a9f173b409adff7138f5e

hornedfiend commented 3 years ago

+1. Having the same issue. Confirmed package downgrade to 1.1.7 fixes it.

Dnnd commented 3 years ago

It appears to me that 1.1.8 have a major bug.

Egl-wayland 1.1.8 breaks the following applications:

I'm using nvidia with PRIME Render Offloading only. Bug occurs even when the application should be launched on Intel UHD (i.e. without __NV_PRIME_RENDER_OFFLOAD=1 __VK_LAYER_NV_optimus=NVIDIA_only __GLX_VENDOR_LIBRARY_NAME=nvidia).

Downgrade to 1.1.7 fixes the problem

System & Graphics Information ``` System: Kernel: 5.14.8-zen1-1-zen x86_64 bits: 64 Desktop: sway 1.6.1 Distro: Arch Linux Graphics: Device-1: Intel CoffeeLake-H GT2 [UHD Graphics 630] driver: i915 v: kernel Device-2: NVIDIA GP107M [GeForce GTX 1050 Mobile] driver: nvidia v: 470.74 Device-3: IMC Networks USB2.0 HD UVC WebCam type: USB driver: uvcvideo Display: wayland server: X.Org 1.21.1.2 compositor: sway driver: loaded: modesetting,nvidia resolution: 1: 1920x1080~60Hz 2: 1920x1200~60Hz OpenGL: renderer: Mesa Intel UHD Graphics 630 (CFL GT2) v: 4.6 Mesa 21.2.2 ```
erik-kz commented 3 years ago

Unfortunately it looks like, apart from the Qt issue mentioned above, there's a more general incompatibility between the current 470 driver and version 1.1.8 of this library. On pre-Turing GPUs, no Wayland EGL applications will be able to present anything at all. Sorry for letting this slip through, it was a testing oversight on our part.

This will be fixed in upcoming 495 driver. Until then, downgrading to 1.1.7 is the best option.

hkaancaliskan commented 3 years ago

@erik-kz Is there an ETA for 495 driver because partial updates not recommended by Arch Linux and I've halted all updates, it's gonna released in near future right?

erik-kz commented 3 years ago

I believe the estimate for the beta release is October 11th. And archlinux is usually pretty quick about getting it into their repos.

mxre commented 3 years ago

Unfortunately it looks like, apart from the Qt issue mentioned above, there's a more general incompatibility between the current 470 driver and version 1.1.8 of this library. On pre-Turing GPUs, no Wayland EGL applications will be able to present anything at all. Sorry for letting this slip through, it was a testing oversight on our part.

This will be fixed in upcoming 495 driver. Until then, downgrading to 1.1.7 is the best option.

So you are saying future versions of egl-wayland will need driver version 495 or newer? Where does this leave users of Kepler? It was reported that 470 would be the last version to support that generation of GPU. Will a future version of egl-wayland be fixed to work with the 470 version, or will the driver fix be backported?

brogers-propstream commented 3 years ago

Unfortunately it looks like, apart from the Qt issue mentioned above, there's a more general incompatibility between the current 470 driver and version 1.1.8 of this library. On pre-Turing GPUs, no Wayland EGL applications will be able to present anything at all. Sorry for letting this slip through, it was a testing oversight on our part.

This will be fixed in upcoming 495 driver. Until then, downgrading to 1.1.7 is the best option.

I'm on a 3070 Max-Q which is decidedly post-Turing. Yet I still have the issue as well as multiple issues on the GNOME desktop environment which I detailed here - https://github.com/NVIDIA/egl-wayland/issues/41

erik-kz commented 3 years ago

That looks like a different issue specific to hybrid-graphics systems.

gardotd426 commented 2 years ago

It seems like the black screen issue is fixed with 495, but it's completely unusable (this wasn't an issue before). System Settings won't open, even the launcher takes 30 seconds to open/respond (I have a 5900X and an RTX 3090). I get errors in SDDM's wayland-session.log like:

ERROR: Unable to find display on any available system
libGL error: failed to create dri screen
libGL error: failed to load driver: nouveau
libGL error: failed to create dri screen
libGL error: failed to load driver: nouveau
libGL error: failed to create dri screen
libGL error: failed to load driver: nouveau
Error getting buffer
Error getting buffer
Error getting buffer
Error getting buffer
Error getting buffer
XIO:  fatal IO error 22 (Invalid argument) on X server ":1"
      after 76 requests (76 known processed) with 0 events remaining.
klauncher: Exiting on signal 1
XIO:  fatal IO error 22 (Invalid argument) on X server ":1"
      after 16 requests (16 known processed) with 0 events remaining.
kdeinit5_wrapper: Warning: connect(/run/user/1000/kdeinit5__1) failed: : No such file or directory
Error: Can not contact kdeinit5!****

Not sure if this is because of the new GBM support or what

ionenwks commented 2 years ago

Which egl-wayland? Note that 495.29.05 ships with egl-wayland-1.1.9 which isn't available here yet, likely have additional fixes. Not that I tried plasma yet, I did get accelerated sway to work though.

gardotd426 commented 2 years ago

I'm using 1.1.9:

/usr/lib/libnvidia-egl-wayland.so.1.1.9

I got accelerated sway to work, but it took:

Vulkan doesn't work (outside vulkaninfo), glxgears doesn't work, but my Nvidia GPU is correctly reported in glxinfo. vkcube and native Vulkan games/applications segfault.

On Thu, Oct 14, 2021 at 2:07 PM Ionen Wolkens @.***> wrote:

Which egl-wayland? Note that 495.29.05 ships with egl-wayland-1.1.9 which isn't available here yet, likely have additional fixes. Not that I tried plasma yet, I did get accelerated sway to work though.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/NVIDIA/egl-wayland/issues/40#issuecomment-943596976, or unsubscribe https://github.com/notifications/unsubscribe-auth/AM5Y335YXLVN7EOFJHNPSNLUG4L5NANCNFSM5E25MSRA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

ionenwks commented 2 years ago

vulkan worked for me (vkcube), but haven't messed much more with it -- I did have a missing cursor so that NO_HARDWARE_CURSORS=1 makes sense. I'm using a single gpu so that likely helps (i.e. didn't need GBM_BACKEND/__GLX).

gardotd426 commented 2 years ago

Everyone on the Linux Gaming Dev discord that's gotten sway to work has also said vkcube (and all other vulkan) breaks. I'm also using a single GPU. 

What's weird, is that on previous drivers, vulkaninfo reported the following with vulkaninfo | grep wayland:

VK_KHR_wayland_surface                 : extension revision 6

Now, I get:

VK_KHR_wayland_surface                 : extension revision 6
    VK_KHR_wayland_surface = false
ionenwks commented 2 years ago

Likely other factors, for example my mesa is built without vulkan support so maybe that prevents it from conflicting -- but I'll have a closer look again later (it was just a quick test). Edit: was also still using egl-wayland-1.1.8 fwiw -- and forgot to say I was on my 2nd compositor test when I ran vkcube (hikari), but it's still wlroots so unsure if it would make a difference. Was also missing a cursor there.

cubanismo commented 2 years ago

Please don't make this a catch-all for wayland issues. If there are other issues with other programs, file separate issues for each of them. The issues with plasma are well understood and a patch to address the problem in Qt has been posted, as noted above.

cauebs commented 2 years ago

Telegram desktop - crashes after click on the image attachments

Still getting this on 1.1.9.r0.gcd0d19a with 495.29.05 drivers.

tgurr commented 2 years ago

https://codereview.qt-project.org/c/qt/qtwayland/+/301712 was merged upstream today, resulting in a merge conflict of the mentioned (and working) https://codereview.qt-project.org/c/qt/qtwayland/+/373473.

I gave https://codereview.qt-project.org/c/qt/qtwayland/+/301712 a go backporting it, but it results in the known "black screen with mouse cursor bug" so the patch from NVIDIA needs to be reworked and resubmitted/updated upstream as distributions usually try to stick to upstream solutions.

VarLad commented 2 years ago

Any update on this? This affects GNOME users as well

cubanismo commented 2 years ago

@VarLad, this issue specifically affects QtQuick applications. Are you saying you're running a QtQuick application under GNOME and are thus affected? If so, yes, that's expected. If not, it's a different issue, and needs a separate issue filed.

tngTUDOR commented 2 years ago

A workaround for Fedora 35 using rpmfusion to install the nvidia drivers, is downgrading to egl-wayland 1.1.7 as follows: sudo dnf downgrade egl-wayland

which would bring:

Name        : egl-wayland
Version     : 1.1.7
Release     : 2.fc35
Architecture: x86_64
Install Date: Fri 12 Nov 2021 08:54:09 CET
Group       : Unspecified
Size        : 59774
License     : MIT
Signature   : RSA/SHA256, Sun 25 Jul 2021 19:54:29 CEST, Key ID db4639719867c58f
Source RPM  : egl-wayland-1.1.7-2.fc35.src.rpm
Build Date  : Thu 22 Jul 2021 00:45:23 CEST
Build Host  : buildvm-x86-08.iad2.fedoraproject.org
Packager    : Fedora Project
Vendor      : Fedora Project
URL         : https://github.com/NVIDIA/egl-wayland
Bug URL     : https://bugz.fedoraproject.org/egl-wayland
Summary     : Wayland EGL External Platform library
Description :
Wayland EGL External Platform library

and


Name        : akmod-nvidia
Epoch       : 3
Version     : 470.74
Release     : 1.fc35
Architecture: x86_64
Install Date: Fri 12 Nov 2021 08:54:13 CET
Group       : Unspecified
Size        : 22918
License     : Redistributable, no modification permitted
Signature   : RSA/SHA1, Wed 22 Sep 2021 09:15:32 CEST, Key ID 6a2af96194843c65
Source RPM  : nvidia-kmod-470.74-1.fc35.src.rpm
Build Date  : Tue 21 Sep 2021 10:53:19 CEST
Build Host  : buildvm-01.online.rpmfusion.net
Packager    : RPM Fusion
Vendor      : RPM Fusion
URL         : http://www.nvidia.com/
Summary     : Akmod package for nvidia kernel module(s)
Description :
This package provides the akmod package for the nvidia kernel modules.
colemickens commented 2 years ago

Can someone from NVIDIA weigh in on this claim from earlier:

Note that 495.29.05 ships with egl-wayland-1.1.9 which isn't available here yet,

That doesn't really inspire confidence, in my opinion. I get it, and I'm thankful things are changing, but it'd be really nice if NVIDIA's shipped binary bits were easily tracked to a branch/commit here.

cubanismo commented 2 years ago

Can someone from NVIDIA weigh in on this claim from earlier:

Note that 495.29.05 ships with egl-wayland-1.1.9 which isn't available here yet,

That doesn't really inspire confidence, in my opinion. I get it, and I'm thankful things are changing, but it'd be really nice if NVIDIA's shipped binary bits were easily tracked to a branch/commit here.

1.1.9 was posted shortly after the release. Generally code arrives here before the corresponding driver release, but the changes contained in 1.1.9 went in rather late in our internal release process and hence showed up in binary form before I'd had a chance to push the here. They weren't deliberately held back. IIRC, I just didn't have time to do the additional build testing with the public build systems the day I checked it in internally, then to be honest, forgot about that task for a few days.

tgurr commented 2 years ago

https://codereview.qt-project.org/c/qt/qtwayland/+/301712 was merged upstream today, resulting in a merge conflict of the mentioned (and working) https://codereview.qt-project.org/c/qt/qtwayland/+/373473.

I gave https://codereview.qt-project.org/c/qt/qtwayland/+/301712 a go backporting it, but it results in the known "black screen with mouse cursor bug" so the patch from NVIDIA needs to be reworked and resubmitted/updated upstream as distributions usually try to stick to upstream solutions.

Since https://invent.kde.org/qt/qt/qtwayland/-/merge_requests/24 has been accepted into the kde/5.15 branch on https://invent.kde.org/qt/qt/qtwayland/-/commits/kde/5.15 supported by the KDE project which is basically also the Qt5 "upstream" for the time being where many distributions nowadays pick their patches from as Qt 5.15 LTS releases are only available for commercial Qt customers. So to make things short there are now patches available which solve this issue and allow a working Plasma wayland session again for NVIDIA users. Fedora for example already pulled the patches in https://src.fedoraproject.org/rpms/qt5-qtwayland/c/770c4ae5bb6d2bb7d2e4659d0fa1c822a005fe8e.

ionenwks commented 2 years ago

Fedora for example already pulled the patches in https://src.fedoraproject.org/rpms/qt5-qtwayland/c/770c4ae5bb6d2bb7d2e4659d0fa1c822a005fe8e.

Gentoo as well

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=fd8e9a8acb5a3db96f416ff50eddb94b9550ef86

nomantis commented 2 years ago

The bug seems to have been solved. I think we can close this issue.