gfx-rs / wgpu

A cross-platform, safe, pure-Rust graphics API.
https://wgpu.rs
Apache License 2.0
12.13k stars 881 forks source link

[wayland] crash in eglTerminate on program exit #4650

Open mcclure opened 10 months ago

mcclure commented 10 months ago

A test program that uses many wgpu features, but otherwise is fairly simple, segfaults on exit when run with winit 0.29. I don't know if this is a winit problem or a wgpu problem but wgpu_hal is in the gdb backtrace. I'm filing here first.

The GDB backtrace looks like

(gdb) bt

#0  0x00007ffff7fb1845 in ?? () from /lib/x86_64-linux-gnu/libwayland-client.so.0
#1  0x00007ffff7fb19a3 in ?? () from /lib/x86_64-linux-gnu/libwayland-client.so.0
#2  0x00007ffff7fb34d2 in wl_proxy_marshal_array_flags () from /lib/x86_64-linux-gnu/libwayland-client.so.0
#3  0x00007ffff7fb3f59 in wl_proxy_marshal_flags () from /lib/x86_64-linux-gnu/libwayland-client.so.0
#4  0x00007ffff46d9937 in ?? () from /lib/x86_64-linux-gnu/libEGL_mesa.so.0
#5  0x00007ffff46ce988 in ?? () from /lib/x86_64-linux-gnu/libEGL_mesa.so.0
#6  0x00007ffff46cee80 in ?? () from /lib/x86_64-linux-gnu/libEGL_mesa.so.0
#7  0x00007ffff46bc97c in ?? () from /lib/x86_64-linux-gnu/libEGL_mesa.so.0
#8  0x0000555556b7d8e2 in khronos_egl::{impl#98}::eglTerminate<libloading::safe::Library> (self=0x5555575d2cb0, 
    display=0x5555575dfc80)
    at /home/mcc/.cargo/registry/src/index.crates.io-6f17d22bba15001f/khronos-egl-6.0.0/src/lib.rs:2321
#9  khronos_egl::Instance<khronos_egl::Dynamic<libloading::safe::Library, khronos_egl::EGL1_4>>::terminate<khronos_egl::Dynamic<libloading::safe::Library, khronos_egl::EGL1_4>> (self=0x5555575d2cb0, display=...)
    at /home/mcc/.cargo/registry/src/index.crates.io-6f17d22bba15001f/khronos-egl-6.0.0/src/lib.rs:1181
#10 0x0000555556bfbbd4 in wgpu_hal::gles::egl::{impl#10}::drop (self=0x5555575dd9d8) at src/gles/egl.rs:621
#11 0x0000555556bc3177 in core::ptr::drop_in_place<wgpu_hal::gles::egl::Inner> ()
    at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/ptr/mod.rs:497
#12 0x0000555556acb27b in core::ptr::drop_in_place<core::cell::UnsafeCell<wgpu_hal::gles::egl::Inner>> ()
    at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/ptr/mod.rs:497
#13 0x0000555556ac893f in core::ptr::drop_in_place<lock_api::mutex::Mutex<parking_lot::raw_mutex::RawMutex, wgpu_hal::gles::egl::Inner>> () at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/ptr/mod.rs:497
#14 0x0000555556ac9e0c in core::ptr::drop_in_place<wgpu_hal::gles::egl::Instance> ()
    at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/ptr/mod.rs:497
--Type <RET> for more, q to quit, c to continue without paging--c

#15 0x0000555556acb3f3 in core::ptr::drop_in_place<core::option::Option<wgpu_hal::gles::egl::Instance>> ()
    at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/ptr/mod.rs:497
#16 0x000055555696e436 in core::ptr::drop_in_place<wgpu_core::instance::Instance> ()
    at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/ptr/mod.rs:497
#17 0x0000555556978a17 in core::ptr::drop_in_place<wgpu_core::global::Global<wgpu_core::identity::IdentityManagerFactory>> ()
    at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/ptr/mod.rs:497
#18 0x000055555696e634 in core::ptr::drop_in_place<wgpu::backend::direct::Context> ()
    at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/ptr/mod.rs:497
#19 0x000055555696ec70 in core::ptr::drop_in_place<dyn wgpu::context::DynContext> ()
    at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/ptr/mod.rs:497
#20 0x000055555686ea1a in alloc::sync::Arc<dyn wgpu::context::DynContext, alloc::alloc::Global>::drop_slow<dyn wgpu::context::DynContext, alloc::alloc::Global> (self=0x7ffffffeef10)
    at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/alloc/src/sync.rs:1749
#21 0x00005555568700d3 in alloc::sync::{impl#33}::drop<dyn wgpu::context::DynContext, alloc::alloc::Global> (
    self=0x7ffffffeef10) at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/alloc/src/sync.rs:2405
#22 0x0000555556972c8b in core::ptr::drop_in_place<alloc::sync::Arc<dyn wgpu::context::DynContext, alloc::alloc::Global>> ()
    at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/ptr/mod.rs:497
#23 0x0000555555a8ea47 in core::ptr::drop_in_place<wgpu::PipelineLayout> ()
    at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/ptr/mod.rs:497
#24 0x0000555555b030a3 in wgpu_hello::run::{async_fn#0} () at src/main.rs:742
#25 0x0000555555a859e3 in pollster::block_on<wgpu_hello::run::{async_fn_env#0}> (fut=...)
    at /home/mcc/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pollster-0.3.0/src/lib.rs:128
#26 0x0000555555afc6fb in wgpu_hello::main () at src/main.rs:758

DETAILS/REPRO

I have a project https://github.com/mcclure/webgpu-tutorial-rs . As the name suggests, it is a (mostly finished) WebGPU tutorial application. It is based on the hello-triangle example.

As of a few days ago, commit a8e1516 was running wgpu 0.16 and winit 0.29-beta (a very early 0.29 beta, so that it had still the 0.28 API surface). This week, I attempted to upgrade. In commit 50c079e I upgraded to winit 0.29 final (while keeping wgpu fixed at 0.16). This had the effect of making the application segfault at close time. Because 0.16 is old, I upgraded to wgpu 0.18. This lead to a different problem #4630, fixed in PR #4635. Because I am trying to test with 0.18 I backported to 0.18 in PR #4649 which you can find the example code for in branch webgpu-tutorial-0.18-fork. You can test by checking out the webgpu-tutorial-0.18-fork branch and running cargo run (or commit 50c079e for the 0.16 version). Warning: When run successfully, this example has audio which currently may be somewhat loud.

Notice I select rwh_05 in Cargo.toml as directed by the wgpu matrix channel.

Platform

I am running with Rust 1.73.0 on Ubuntu 23.10 with Wayland and an almost entirely stock desktop environment. I am on a "Lenovo ThinkPad T14 Gen 3" with gpu "AMD Ryzen™ 5 PRO 6650U with Radeon™ Graphics × 12". The failure occurs 100% of the time. I have not tested on other platforms.

Both the 0.16 and 0.18 wgpu branch implicated wgpu_hal in the bt, though I don't think they were the same bt.

Expected behavior

In Rust, even if I do something wrong, I should get a panic with a clear error message rather than a segfault.

Note

I would be happy to file this bug on winit instead (since it was upgrading winit, not upgrading wgpu, that caused the problem) but I need to be able to plausibly say "I asked wgpu and they said it's not their fault (even though their frames are all over the gdb stack)".

SludgePhD commented 10 months ago

I've also observed these for a while. They can be avoided by requesting an instance with Backends::PRIMARY instead of the default Backends::all(), since they are caused by the GL backend.

mcclure commented 10 months ago

I tested and this problem is Wayland exclusive. If I log out, log back in to "Ubuntu with xorg", and repeat the tests, I do not get the crash in either 0.16 or 0.18.

mcclure commented 10 months ago

They can be avoided by requesting an instance with Backends::PRIMARY instead of the default Backends::all(), since they are caused by the GL backend.

Since not all systems have Vulkan, surely you want to include support for the GL backend?

SludgePhD commented 10 months ago

Since not all systems have Vulkan, surely you want to include support for the GL backend?

Yeah, YMMV. All systems I care about do support Vulkan, and I believe the GL backend has some other bugs too, so turning it off made sense.

Wumpf commented 3 months ago

Issue still happens in 0.20:

Didn't happen in 0.19 because of another bug :/