emilk / egui

egui: an easy-to-use immediate mode GUI in Rust that runs on both web and native
https://www.egui.rs/
Apache License 2.0
22.17k stars 1.6k forks source link

Enabling the wgpu feature results in a panic "The surface isn't supported by this adapter" #5269

Open VorpalBlade opened 1 week ago

VorpalBlade commented 1 week ago

Describe the bug Enabling the wgpu feature causes the program to crash immediately at startup. This can be reproduced on the hello world example even.

❯ cargo run    
    Finished dev [unoptimized + debuginfo] target(s) in 0.17s
     Running `/home/arvid/src/egui/target/debug/hello_world`
wp_linux_drm_syncobj_manager_v1#55: error 0: surface already exists
Protocol error 0 on object wp_linux_drm_syncobj_manager_v1@55: 
[2024-10-15T20:47:48Z ERROR wgpu_hal::vulkan::adapter] get_physical_device_surface_present_modes: ERROR_SURFACE_LOST_KHR
[2024-10-15T20:47:48Z ERROR wgpu_hal::vulkan::adapter] get_physical_device_surface_formats: ERROR_SURFACE_LOST_KHR
thread 'main' panicked at crates/egui-wgpu/src/winit.rs:173:18:
The surface isn't supported by this adapter
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

The full backtrace is:

stack backtrace:
   0: rust_begin_unwind
             at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panicking.rs:645:5
   1: core::panicking::panic_fmt
             at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/panicking.rs:72:14
   2: core::panicking::panic_display
             at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/panicking.rs:196:5
   3: core::panicking::panic_str
             at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/panicking.rs:171:5
   4: core::option::expect_failed
             at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/option.rs:1980:5
   5: core::option::Option<T>::expect
             at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/option.rs:894:21
   6: egui_wgpu::winit::Painter::configure_surface
             at /home/arvid/src/egui/crates/egui-wgpu/src/winit.rs:170:15
   7: egui_wgpu::winit::Painter::resize_and_generate_depth_texture_view_and_msaa_view
             at /home/arvid/src/egui/crates/egui-wgpu/src/winit.rs:342:9
   8: egui_wgpu::winit::Painter::on_window_resized
             at /home/arvid/src/egui/crates/egui-wgpu/src/winit.rs:405:13
   9: eframe::native::wgpu_integration::WgpuWinitRunning::on_window_event
             at /home/arvid/src/egui/crates/eframe/src/native/wgpu_integration.rs:779:25
  10: <eframe::native::wgpu_integration::WgpuWinitApp as eframe::native::winit_integration::WinitApp>::window_event
             at /home/arvid/src/egui/crates/eframe/src/native/wgpu_integration.rs:457:16
  11: <eframe::native::run::WinitAppWrapper<T> as winit::application::ApplicationHandler<eframe::native::winit_integration::UserEvent>>::window_event::{{closure}}
             at /home/arvid/src/egui/crates/eframe/src/native/run.rs:285:22
  12: eframe::native::event_loop_context::with_event_loop_context
             at /home/arvid/src/egui/crates/eframe/src/native/event_loop_context.rs:53:5
  13: <eframe::native::run::WinitAppWrapper<T> as winit::application::ApplicationHandler<eframe::native::winit_integration::UserEvent>>::window_event
             at /home/arvid/src/egui/crates/eframe/src/native/run.rs:280:9
  14: winit::event_loop::dispatch_event_for_app
             at /home/arvid/.cargo/registry/src/index.crates.io-6f17d22bba15001f/winit-0.30.5/src/event_loop.rs:642:52
  15: winit::platform::run_on_demand::EventLoopExtRunOnDemand::run_app_on_demand::{{closure}}
             at /home/arvid/.cargo/registry/src/index.crates.io-6f17d22bba15001f/winit-0.30.5/src/platform/run_on_demand.rs:76:13
  16: core::ops::function::impls::<impl core::ops::function::FnMut<A> for &mut F>::call_mut
             at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/ops/function.rs:294:13
  17: winit::platform_impl::linux::wayland::event_loop::EventLoop<T>::single_iteration
             at /home/arvid/.cargo/registry/src/index.crates.io-6f17d22bba15001f/winit-0.30.5/src/platform_impl/linux/wayland/event_loop/mod.rs:398:17
  18: winit::platform_impl::linux::wayland::event_loop::EventLoop<T>::pump_events
             at /home/arvid/.cargo/registry/src/index.crates.io-6f17d22bba15001f/winit-0.30.5/src/platform_impl/linux/wayland/event_loop/mod.rs:211:13
  19: winit::platform_impl::linux::wayland::event_loop::EventLoop<T>::run_on_demand
             at /home/arvid/.cargo/registry/src/index.crates.io-6f17d22bba15001f/winit-0.30.5/src/platform_impl/linux/wayland/event_loop/mod.rs:181:19
  20: winit::platform_impl::linux::EventLoop<T>::run_on_demand
             at /home/arvid/.cargo/registry/src/index.crates.io-6f17d22bba15001f/winit-0.30.5/src/platform_impl/linux/mod.rs:813:56
  21: <winit::event_loop::EventLoop<T> as winit::platform::run_on_demand::EventLoopExtRunOnDemand>::run_on_demand
             at /home/arvid/.cargo/registry/src/index.crates.io-6f17d22bba15001f/winit-0.30.5/src/platform/run_on_demand.rs:89:9
  22: winit::platform::run_on_demand::EventLoopExtRunOnDemand::run_app_on_demand
             at /home/arvid/.cargo/registry/src/index.crates.io-6f17d22bba15001f/winit-0.30.5/src/platform/run_on_demand.rs:75:9
  23: eframe::native::run::run_and_return
             at /home/arvid/src/egui/crates/eframe/src/native/run.rs:300:5
  24: eframe::native::run::run_wgpu::{{closure}}
             at /home/arvid/src/egui/crates/eframe/src/native/run.rs:357:13
  25: eframe::native::run::with_event_loop::{{closure}}
             at /home/arvid/src/egui/crates/eframe/src/native/run.rs:52:12
  26: std::thread::local::LocalKey<T>::try_with
             at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/thread/local.rs:270:16
  27: std::thread::local::LocalKey<T>::with
             at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/thread/local.rs:246:9
  28: eframe::native::run::with_event_loop
             at /home/arvid/src/egui/crates/eframe/src/native/run.rs:42:5
  29: eframe::native::run::run_wgpu
             at /home/arvid/src/egui/crates/eframe/src/native/run.rs:355:16
  30: eframe::run_native
             at /home/arvid/src/egui/crates/eframe/src/lib.rs:265:13
  31: hello_world::main
             at ./src/main.rs:12:5
  32: core::ops::function::FnOnce::call_once
             at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/ops/function.rs:250:5

To Reproduce Steps to reproduce the behavior:

  1. Go to the egui/eframe examples/hello_world directory
  2. Enable wgpu feature for eframe in Cargo.toml
  3. cargo run

Expected behavior Program should start without errors. Or at least not panic (return an Error instead). And that error should be less cryptic. (I'm not a GPU expert, I'm trying to make a simple GUI.).

I don't know what the error means, but it should probably select a suitable surface instead of an unsupported surface. Whatever a surface is in this context.

Screenshots

Desktop (please complete the following information):

Additional context

emilk commented 1 day ago

I agree we shouldn't panic here. Maybe @Wumpf wants to take a look at some point? 🙏

Wumpf commented 1 day ago

yep, we definitely shouldn't crash and instead try to forward an error if needed

As to actual source of the issue (and how to show it better): this might be a tricky one. This piece of log output seems to come directly from wayland

wp_linux_drm_syncobj_manager_v1#55: error 0: surface already exists
Protocol error 0 on object wp_linux_drm_syncobj_manager_v1@55: 

And searching for this a bit brings up people recommending to downgrade the nvidia driver from 560 to 555, e.g. here This is also being discussed in the nvidia forum here. Given that you're saying you're only running with Intel, that's probably not the whole story 🤔

VorpalBlade commented 1 day ago

Is it possible to force Egui/eframe with wgpu to use xwayland as a test? Some environment variable perhaps?

Also, vkcube and glxgears both work fine under Wayland with both Intel and nvidia graphics. This indicates to me that both basic platform APIs work (at least to some extent). So it seems like the issue could be on the wgpu side.

Wumpf commented 1 day ago

Hmm yeah then it is probably something that wgpu could do different. There's unfortunately quite many wayland issues there that are waiting for contributors to look into (the maintainers either don't have local repro cases (that includes myself) or no high priority to look into these (most of the time both)).

You could try with unset WAYLAND_DISPLAY and see if that makes a difference

VorpalBlade commented 1 day ago

So uh, tried this again. It isn't happening any more (at least in hybrid mode, with and without prime-run, I will reboot to pure Intel and see if it happens there).

Though according to nvidia-smi in hybrid mode without prime-run the WGPU build is still selecting the nVidia GPU. That sounds like a bug (the glow build respects the environment variables prime-run sets). What a mess (and I don't know if this is a nvidia, mesa or wgpu bug).

As I'm on Arch Linux (which is rolling release) that could be from some update or other. In this case the only difference I can find is that I'm now on KDE Plasma 6.2.1 and Kernel 6.11.3. Mesa is the same version.

I did not note down the nvidia driver version before, but now it is 560.35.03-16.

VorpalBlade commented 1 day ago

Unsetting WAYLAND_DISPLAY does not make the program use X for some reason (program is not listed in output of xlsclients which some other programs are). No idea how that works.

Building eframe without the wayland feature disabled doesn't make it use X11 either (huh?).

VorpalBlade commented 1 day ago

Aha, in pure Intel mode it still fails. That is interesting. If WGPU incorrectly uses the nVidia GPU when it is not supposed to (as indicated above, in hybrid mode) that could definitely cause issues.

As the programs exits too quickly, it is hard to know. I tried to set a breakpoint to where it crashes and use nvidia-smi to see if anything was using the nVidia GPU at that point, but nvidia-smi comes up blank (so unclear).

However, if I run vkcube while booted into pure Intel mode, it does select the nvidia GPU by default, and manages to run on it somehow. nvidia-smi reports that. For vkcube you can override this with --gpu_number 0. I can't spot any way to force one GPU or the other for eframe + WGPU though? Maybe this field has something but the link to the actual type is broken (goes to a page reading "docs.rs failed to build egui-wgpu-0.29.1")

Wumpf commented 1 day ago

Huh, strange why would docs fail for 0.29 Oo. They're up for 0.28 and the struct hasn't changed. Yes, indeed you should be able to adjust which gpu is choosen by changing the power_preference. To figure out more about the device selection process you may also want to run with RUST_LOG=trace

lucasmerlin commented 1 day ago

Docs were fixed by this: https://github.com/emilk/egui/pull/5204

But we'd need to make a new patch release for the docs to show up

VorpalBlade commented 1 day ago

Setting WGPU_BACKEND=opengl works around this issue. So it is an issue with vulkan + wgpu + some not yet clear combo of mesa/kernel/PRIME/nvidia.

There should be a way (either automatically by egui/eframe or by the application) to fall back to opengl if vulkan isn't working. Possibly this also means being able to fall back to glow if wgpu isn't working and you compiled in support for both.

A prerequisite for this is to not panic though (I would rather not mess around with catch_unwind, I believe it is generally a bad idea to do so).

Yes, indeed you should be able to adjust which gpu is choosen by changing the power_preference.

I will try to look into that soon, I'm currently dealing with a rather nasty cold, so it may take a few days.

Wumpf commented 21 hours ago

Yeah I think that "this surface doesn't have any formats" should be an exclusion criteirum for an adapter. That ofc still leaves the separate issue of the surface mysteriously not advertising any formats. I think this should actually already on the wgpu level within its "compatible surface" check, but we should be able to do this on within egui-wgpu in addition to getting a patch upstream 🤔

VorpalBlade commented 17 hours ago

Unfortunately WGPU_POWER_PREF (the environment variable variant of this) has absolutely no effect regardless of the value it is set to:

I have not tested messing with this programatically.

Wumpf commented 15 hours ago

for reference, found this issue on egl-wayland today which might be related. However, the user there reports that glxgears has the issue as well which wasn't the case for you 🤔 https://github.com/NVIDIA/egl-wayland/issues/96