NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
18.37k stars 14.33k forks source link

element-desktop-1.11.33 crashes on startup with Wayland #238416

Open talex5 opened 1 year ago

talex5 commented 1 year ago

Describe the bug

After updating to the latest element-desktop-wayland, it crashes on startup.

Steps To Reproduce

Just run it:

> element-desktop
/home/tal/.config/Element exists: yes
/home/tal/.config/Riot exists: no
LaunchProcess: failed to execvp:
xdg-settings
[13821:0618/140708.779432:ERROR:object_proxy.cc(590)] Failed to call method: org.freedesktop.DBus.Properties.Get: object_path= /org/freedesktop/portal/desktop: org.freedesktop.DBus.Error.InvalidArgs: No such interface “org.freedesktop.portal.FileChooser”
[13821:0618/140708.779489:ERROR:select_file_dialog_linux_portal.cc(274)] Failed to read portal version property
No update_base_url is defined: auto update is disabled
Fetching translation json for locale: en_EN
Changing application language to en
Fetching translation json for locale: en
Resetting the UI components after locale change
Resetting the UI components after locale change
Changing application language to en
Fetching translation json for locale: en
Resetting the UI components after locale change
fish: Job 1, 'element-desktop' terminated by signal SIGSEGV (Address boundary error)

The messages displayed are the same for the working and failing versions, except for the SIGSEGV line in the failing one.

coredumpctl debug doesn't seem helpful:

Core was generated by `/nix/store/ki972q6dz6r319d4ibcpi71g9s5w90cs-electron-25.1.1/lib/electron/.elect'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000556a03e02874 in ?? ()
[Current thread is 1 (Thread 0x7fe8c5abf4c0 (LWP 13821))]
(gdb) bt
#0  0x0000556a03e02874 in ?? ()
#1  0x0000000000000000 in ?? ()

Notify maintainers

@ma27 @fadenb @mguentner @ekleog @ralith @dandellion @sumnerevans

Metadata

tal@bree> nix-shell -p nix-info --run "nix-info -m"
 - system: `"x86_64-linux"`
 - host os: `Linux 6.1.34, NixOS, 23.05 (Stoat), 23.05.20230616.c7ff1b9`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.13.3`
 - channels(tal): `""`
 - channels(guest): `""`
 - channels(root): `"nixos-23.05"`
 - nixpkgs: `/nix/var/nix/profiles/per-user/root/channels/nixos`

Commit 9b19034f7d0 doesn't work, while the previous commit does.

sagehane commented 1 year ago

I've also had the same issue.

RyanGibb commented 1 year ago

Same issue here. It looks like the only upstream change is a bump to matrix-react-sdk https://github.com/vector-im/element-desktop/compare/v1.11.32...v1.11.33

lilyinstarlight commented 1 year ago

Related to Electron 25 bump in #237070. See vector-im/element-desktop#1026 for upstream issue

nazarewk commented 1 year ago

is there any workaround for it yet?

lilyinstarlight commented 1 year ago

is there any workaround for it yet?

I've just been running NIXOS_OZONE_WL= element-desktop and dealing with the blurriness. I did consider chasing down the issue and either making an Electron patch or Element patch to fix it, but I have not had the time

fogti commented 1 year ago

I also encountered a similar problem, element-desktop doesn't crash (on wayland) but does not display anything (23.11.20230621.faee04a) (last known working: 23.11.20230606.0ce0c73, running it results in "DRI driver not from this Mesa build ('23.1.2' vs '23.1.1') failed to bind extensions" warnings, but it works)

lilyinstarlight commented 1 year ago

DRI driver not from this Mesa build ('23.1.2' vs '23.1.1') failed to bind extensions

You might need to log out and log back in after updating or restart. That message means you've activated a config with Mesa 23.1.2 but tried to run element-desktop compiled against Mesa 23.1.1 (e.g. the element-desktop before you updated, possibly from old cached path/desktop entry in your DE)

fogti commented 1 year ago

You might need to log out and log back in after updating or restart. That message means you've activated a config with Mesa 23.1.2 but tried to run element-desktop compiled against Mesa 23.1.1 (e.g. the element-desktop before you updated, possibly from old cached path/desktop entry in your DE)

well, that's not the issue, I just run the last known good executable, which is built against an older version of nixpkgs... and somehow appears to hardcode the DRI driver location...

lilyinstarlight commented 1 year ago

Oh I didn't read your initial message very well. Do you get the "doesn't crash but doesn't display anything" behavior with element-desktop from the same nixpkgs revision your system is built with then?

(Also Mesa on NixOS impurely loads DRI drivers from /run/opengl-drivers so for software that requires graphical acceleration, you do need to only use the same Mesa version as the program was compiled with (since recently Mesa stopped supporting mixing versions like that, unfortunately))

raboof commented 1 year ago

a similar problem, element-desktop doesn't crash (on wayland) but does not display anything

I also have this variation of the problem: element-desktop starts, shows notifications, shows a tray icon, and even shows the 'show/hide or quit' window when right-clicking the tray icon, but does not show the main window. I tried to bisect but it seems it (also) depends on some state on the filesystem. NIXOS_OZONE_WL= also works around it for me.

The console shows a bunch of errors around the GPU, but I haven't checked whether those are also there when it does work:

``` [18594:0626/140645.390284:ERROR:gbm_wrapper.cc(258)] Failed to export buffer to dma_buf: No such file or directory (2) [18594:0626/140645.390414:ERROR:gbm_pixmap_wayland.cc(75)] Cannot create bo with format= RGBA_8888 and usage=SCANOUT [18594:0626/140645.395204:ERROR:gbm_wrapper.cc(258)] Failed to export buffer to dma_buf: No such file or directory (2) [18594:0626/140645.395326:ERROR:gbm_pixmap_wayland.cc(75)] Cannot create bo with format= RGBA_8888 and usage=GPU_READ [18594:0626/140645.395428:ERROR:shared_image_factory.cc(673)] CreateSharedImage: could not create backing. [18594:0626/140645.395491:ERROR:shared_image_factory.cc(527)] DestroySharedImage: Could not find shared image mailbox [18594:0626/140645.395612:ERROR:gpu_service_impl.cc(1010)] Exiting GPU process because some drivers can't recover from errors. GPU process will restart shortly. [18601:0626/140645.398349:ERROR:command_buffer_proxy_impl.cc(128)] ContextResult::kTransientFailure: Failed to send GpuControl.CreateCommandBuffer. [18538:0626/140645.416222:ERROR:gpu_process_host.cc(954)] GPU process exited unexpectedly: exit_code=8704 [18644:0626/140645.540218:ERROR:gbm_wrapper.cc(258)] Failed to export buffer to dma_buf: No such file or directory (2) [18644:0626/140645.540437:ERROR:gbm_pixmap_wayland.cc(75)] Cannot create bo with format= RGBA_8888 and usage=SCANOUT [18644:0626/140645.547579:ERROR:gbm_wrapper.cc(258)] Failed to export buffer to dma_buf: No such file or directory (2) [18644:0626/140645.547741:ERROR:gbm_pixmap_wayland.cc(75)] Cannot create bo with format= RGBA_8888 and usage=GPU_READ [18644:0626/140645.547865:ERROR:shared_image_factory.cc(673)] CreateSharedImage: could not create backing. [18644:0626/140645.547970:ERROR:shared_image_factory.cc(527)] DestroySharedImage: Could not find shared image mailbox [18644:0626/140645.548152:ERROR:gpu_service_impl.cc(1010)] Exiting GPU process because some drivers can't recover from errors. GPU process will restart shortly. [18538:0626/140645.571907:ERROR:gpu_process_host.cc(954)] GPU process exited unexpectedly: exit_code=8704 [18669:0626/140645.652603:ERROR:gbm_wrapper.cc(258)] Failed to export buffer to dma_buf: No such file or directory (2) [18669:0626/140645.652776:ERROR:gbm_pixmap_wayland.cc(75)] Cannot create bo with format= RGBA_8888 and usage=SCANOUT [18669:0626/140645.656626:ERROR:gbm_wrapper.cc(258)] Failed to export buffer to dma_buf: No such file or directory (2) [18669:0626/140645.656780:ERROR:gbm_pixmap_wayland.cc(75)] Cannot create bo with format= RGBA_8888 and usage=GPU_READ [18669:0626/140645.656919:ERROR:shared_image_factory.cc(673)] CreateSharedImage: could not create backing. [18669:0626/140645.657126:ERROR:shared_image_factory.cc(527)] DestroySharedImage: Could not find shared image mailbox [18669:0626/140645.657396:ERROR:gpu_service_impl.cc(1010)] Exiting GPU process because some drivers can't recover from errors. GPU process will restart shortly. [18538:0626/140645.672915:ERROR:gpu_process_host.cc(954)] GPU process exited unexpectedly: exit_code=8704 [18601:0626/140645.740591:ERROR:command_buffer_proxy_impl.cc(128)] ContextResult::kTransientFailure: Failed to send GpuControl.CreateCommandBuffer. ```
fogti commented 1 year ago

@raboof. I encountered the same behavior.

raboof commented 1 year ago

The console shows a bunch of errors around the GPU, but I haven't checked whether those are also there when it does work

Today it works for me, and indeed no DMA/GPU errors

lheckemann commented 1 year ago

I noticed that it starts quite reliably when I have my laptop on the "performance" power profile, while it fails to start up quite reliably on "powersave". Smells like some sort of race condition...

I've also noticed that it renders pretty slowly and eats a lot of CPU -- which leads me to suspect it's not using GPU-accelerated rendering? No relevant errors on stdout/stderr though.

lilyinstarlight commented 1 year ago

Can everyone share what Wayland compsitor they are using, what GPU driver they are using, and what NixOS channel they are using (e.g. unstable or 23.05)?

I'm wondering if the behavior differs based on compositor

I'm on Sway, i915, nixos-unstable

raboof commented 1 year ago

I'm using volare (99% sway, 1% funky stuff) on nixos-unstable. TBH I don't know how to tell which GPU driver I'm using - lspci shows I have an Intel controller using the i915 kernel module and an NVIDIA 3D controller using nouveau.

fogti commented 1 year ago

Sway, radeon, nixos-unstable

talex5 commented 1 year ago

Sway, amdgpu (according to lsmod), NixOS 23.05 for me.

(also: I'm sure it's not related, but for completeness I should mention I have a slightly patched wlroots: see https://github.com/talex5/wayland-proxy-virtwl/issues/55#issuecomment-1564515542)

lheckemann commented 1 year ago

Sway, i915, nixos-23.05

lheckemann commented 1 year ago

Just experienced this reliably independent of the power profile on a new machine (which is however identical to the other one...). Running it under strace to try and work out the problem doesn't reproduce it, so +1 on my previous suspicion that it's a race condition :roll_eyes:

lilyinstarlight commented 1 year ago

It does seem that making a sway window rule to float Element (and also removing the window-state.json file to reset it) lets it work again, but as soon as I make it tiled it crashes

It also seems Element isn't the only app affected by this, so it's definitely an upstream Electron issue (it's always an Electron issue...)

999eagle commented 1 year ago

Just to confirm this, I'm on sway, amdgpu, nixos-unstable (645ff62 to be exact) and also had this issue. Using (element-desktop.override {electron = electron_24;}) fixed it for me, so it's highly likely to be related to Electron itself.

lheckemann commented 1 year ago

Should we change the package to use 24 by default, although that diverges from what upstream uses https://github.com/vector-im/element-desktop/blob/4e69dda7d23028c141dc05467ce4a67f2781dcdb/package.json#L93 ?

dryya commented 1 year ago

While using 24 might work on amdgpu, it seems to still be broken on nvidia (with sway, nixos-unstable). With (element-desktop.override {electron = electron_24;}) I get

MESA-LOADER: failed to retrieve device information
MESA-LOADER: failed to open nvidia-drm: /run/opengl-driver/lib/dri/nvidia-drm_dri.so: cannot open shared object file: Permission denied (search paths /run/opengl-driver/lib/dri, suffix _dri)
MESA-LOADER: failed to open zink: /run/opengl-driver/lib/dri/zink_dri.so: cannot open shared object file: Permission denied (search paths /run/opengl-driver/lib/dri, suffix _dri)
MESA-LOADER: failed to open kms_swrast: /run/opengl-driver/lib/dri/kms_swrast_dri.so: cannot open shared object file: Permission denied (search paths /run/opengl-driver/lib/dri, suffix _dri)
MESA-LOADER: failed to open swrast: /run/opengl-driver/lib/dri/swrast_dri.so: cannot open shared object file: Permission denied (search paths /run/opengl-driver/lib/dri, suffix _dri)

which someone else reported happening with webcord: https://github.com/SpacingBat3/WebCord/issues/433. With --ozone-platform=x11 I get the same error and a black window instead of no window.

The same error was reported in this repo in April: https://github.com/NixOS/nixpkgs/issues/224332#issuecomment-1519033918

Edit: It does seem to work with --disable-gpu though.

CobaltCause commented 1 year ago

Experience report for 1.11.35:

Driver Compositor Is NIXOS_OZONE_WL=1 set? Was --use-gl=desktop passed? Does it work?
NVIDIA KDE Yes No No, window refuses to open
NVIDIA KDE Yes Yes No, window refuses to open
NVIDIA KDE No No No, window opens but renders a black screen
NVIDIA KDE No Yes Yes
amdgpu Hyprland Yes No No, a transparent window pops open and immediately closes, or a window pops up and things are drawn but interacting with the window in any way kills it
amdgpu Hyprland Yes Yes No, window refuses to open
amdgpu Hyprland No No Yes
amdgpu Hyprland No Yes Yes
i915 Hyprland Yes No No, a transparent window pops open and immediately closes, or a window pops up and things are drawn but interacting with the window in any way kills it
i915 Hyprland Yes Yes No, window refuses to open
i915 Hyprland No No Yes
i915 Hyprland No Yes Yes

stdout/stderr logs:

nvidia-no-no.txt nvidia-no-yes.txt nvidia-yes-no.txt nvidia-yes-yes.txt

amdgpu-no-no.txt amdgpu-no-yes.txt amdgpu-yes-no.txt amdgpu-yes-yes.txt

The i915 logs are identical to the amdgpu logs so there's no reason to upload those.

Disclaimer: I have no idea what --use-gl=desktop actually does or why it works. It even seems like this is supposed to be the default behavior, if I'm reading this correctly: https://source.chromium.org/chromium/chromium/src/+/main:ui/gl/gl_switches.cc;l=100-101;drc=8bca1335dc7993df8e44307816092f9f9d25d4aa.

Note: If you're looking at this and thinking "but I don't see my compositor on the list, what do I do!?", the answer is the compositor is most likely irrelevant and all you need to focus on is the GPU driver. I only included it in the unlikely event that it does matter.

Ramblurr commented 1 year ago

I can confirm I can reproduce one of those combinations:

Driver: Nvidia Compositor: Hyprland Is NIXOS_OZONE_WL=1 set?: No Was --use-gl=desktop passed?: Yes Does it work: Yes?

$ NIXOS_OZONE_WL= /nix/store/ans88lqlrs559jjab71ccca93db8bni7-element-desktop-1.11.35/bin/element-desktop --use-gl=desktop

Any other combination either results in no window or a black window.

I'm on nixos unstable using the nvidia driver from nvidiaPackages.production .

However I do get horrible flicker.

adamcstephens commented 1 year ago

I'm experiencing the same thing on river, amd and unstable.

Downgrading electron to 24 allows me to use the app for now.

lilyinstarlight commented 1 year ago

Supposedly, according to the linked element-desktop issue linked higher up in this thread, Electron 26 fixes this. If anyone wants to test

(It seemed racy to begin with for the last several Electron versions, though, so I wouldn't be surprised if it was just "fixed" in that it no longer occurs with the current version rather than actually fixed)

999eagle commented 1 year ago

I've changed my override to explicitly use electron_26 for building element-desktop (version 1.11.40 currently) and it seems to work fine. Still on amdgpu and wayland (sway). It does consistently segfault when quitting but that's not exactly a big issue.

Ramblurr commented 1 year ago

Report: element package from nixpkgs master and electron_26, Nvidia GPU, and wayland (hyprland). Still does not work. I still have to launch with NIXOS_OZONE_WL= element-desktop --use-gl=desktop and even then the flickering and typing lag is extreme making it unusable.

CobaltCause commented 1 year ago

I've changed my override to explicitly use electron_26 for building element-desktop (version 1.11.40 currently) and it seems to work fine. Still on amdgpu and wayland (sway). It does consistently segfault when quitting but that's not exactly a big issue.

FWIW, same experience here except on Hyprland instead of Sway.

i-am-logger commented 1 year ago

I've changed my override to explicitly use electron_26 for building element-desktop (version 1.11.40 currently) and it seems to work fine. Still on amdgpu and wayland (sway). It does consistently segfault when quitting but that's not exactly a big issue.

FWIW, same experience here except on Hyprland instead of Sway.

same here

Ramblurr commented 11 months ago

Element 1.11.51 is out using electron 27, and it still does not work on nixos with wayland.

CobaltCause commented 11 months ago

Presumably you mean specifically with the NVIDIA proprietary drivers?

Ramblurr commented 11 months ago

Correct, sorry, got lazy yesterday, let me add more detail like my previous comment.

I saw that the develop branch of element-desktop is using Electron v28. So I tested with it, and it works pretty well!

Electron Version Is NIXOS_OZONE_WL=1 set? Was --use-gl=desktop passed? Does it work?
28.0.0 Yes Yes No, no screen
" Yes No No, no screen
" No Yes Yes (!)
" No No Yes (!)
27.1.3 Yes Yes No, no screen
" Yes No No, no screen
" No Yes No, no screen
" No No No, no screen
26.3.0 Yes Yes No, no screen
" Yes No No, no screen
" No Yes Sort of, screen renders but much input latency and visual flickering
" No No No, no screen
...
      home.packages = [
        (edge.element-desktop.override {electron = pkgs.electron_28;})
      ];
...
Minionflo commented 1 month ago

any updates on this? im having the same errors

Ramblurr commented 1 month ago

So over the past year the exact behavior has changed week-to-week as I update unstable and upstream packages change.

Currently:

element-desktop is running fine. No flickering, no missing screen, etc. Just works