gfx-rs / wgpu

A cross-platform, safe, pure-Rust graphics API.
https://wgpu.rs
Apache License 2.0
12.3k stars 905 forks source link

EGL BadDisplay when using Wayland #5505

Open valaphee opened 5 months ago

valaphee commented 5 months ago

Description When running any example on Wayland using WGPU_BACKEND=gl it crashes with:

[2024-04-07T11:57:37Z ERROR wgpu_hal::gles::egl] EGL 'eglMakeCurrent' code 0x3008: eglMakeCurrent
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: BadDisplay', wgpu-hal/src/gles/egl.rs:305:14

Repro steps WGPU_BACKEND=gl cargo run --bin wgpu-examples

Expected vs observed behavior Running like it does on GL/Vulkan on X11 and Vulkan onWayland

Extra materials backtrace.txt

Platform

[2024-04-07T11:57:37Z DEBUG wgpu_hal::gles::adapter] Vendor: AMD
[2024-04-07T11:57:37Z DEBUG wgpu_hal::gles::adapter] Renderer: AMD Radeon RX 6900 XT (radeonsi, navi21, LLVM 17.0.6, DRM 3.57, 6.8.2-arch2-1)
[2024-04-07T11:57:37Z DEBUG wgpu_hal::gles::adapter] Version: OpenGL ES 3.2 Mesa 24.0.4-arch1.2
[2024-04-07T11:57:37Z DEBUG wgpu_hal::gles::adapter] SL version: OpenGL ES GLSL ES 3.2
AuthorityFX commented 5 months ago

I'm getting a similar issue.

Surface isn't supported by the adapter.
Friz64 commented 2 months ago

Okay I THINK I have acquired the arcane knowledge of what wgpu-hal/src/gles/egl.rs is trying to do.

I got it working, but I have committed some very cursed crimes in the process.

proof ![image](https://github.com/user-attachments/assets/cb205e23-8d92-4375-a9a2-97f919b5c650)

Essentially the problem is exactly what the error is saying: BadDisplay. On Wayland, get_platform_display is currently fed with khronos_egl::DEFAULT_DISPLAY, here. When I use some "special unsafe debugging tricks" to get the winit window's display handle in there, that error disappears. And then when you completely disable this specialty here, it runs!

By "special unsafe debugging tricks" I mean I stored the display handle in a static beforehand, to be able to access it there. Obviously, a proper solution remains to be found.

I also tried modifying DisplayRef::Wayland to hold the display handle from test_wayland_display(), and then using that to get the platform display, but I think that behaves the same way as khronos_egl::DEFAULT_DISPLAY.

Sooo... I'm thinking the play might be to delay get_platform_display until we have access to the correct display handle in create_surface? Does that make sense? I don't know, I'm going to bed.

tombh commented 1 month ago

I'm sure this must be obvious to most, but I got the examples working with: WGPU_BACKEND=vulkan cargo run --bin wgpu-examples bunnymark

I'm on Asahi Linux (ie ARM64 and no Vulkan support) and just wanted to see the examples running, even though they're merely being emulated on the CPU (device_type: Cpu, driver: "llvmpipe").

My own wpgu project works fine using OpenGL, it's just the majority of the examples that seem to have a problem.

BartBM commented 1 month ago

Bump ^^

Same issue

[2024-09-02T17:32:10Z ERROR wgpu_hal::gles::egl] EGL 'eglMakeCurrent' code 0x3008: eglMakeCurrent
thread 'main' panicked at wgpu-hal/src/gles/egl.rs:298:14:
called `Result::unwrap()` on an `Err` value: BadDisplay
[2024-09-02T17:32:10Z INFO  wgpu_core::instance] Adapter AdapterInfo { name: "AMD Radeon RX 7900 XTX (radeonsi, navi31, LLVM 17.0.6, DRM 3.57, 6.8.0-41-generic)", vendor: 4098, device: 0, device_type: Other, driver: "", driver_info: "4.6 (Core Profile) Mesa 24.0.9-0ubuntu0.1", backend: Gl }
[2024-09-02T17:32:10Z INFO  wgpu_examples::framework] Using AMD Radeon RX 7900 XTX (radeonsi, navi31, LLVM 17.0.6, DRM 3.57, 6.8.0-41-generic) (Gl)
jimblandy commented 4 weeks ago

@MarijnS95 I don't want to distract you from the good work you're doing with #6152 and related topics, but: do you have any idea how we could make progress on this?

Is this just an upstream winit problem that we can't do anything about?

MarijnS95 commented 4 weeks ago

@jimblandy I can surely take a look and comment on the implementation, when booted back into Linux in the coming days. It sounds like @Friz64 already grasped the problem in that passing DEFAULT_DISPLAY is not valid since the implementation now has no clue what Wayland "display connection" to use.

I did run raw-gles on Wayland successfully though (https://github.com/gfx-rs/wgpu/pull/6176#issue-2491324334), but it uses its own initialization path and surface.

MarijnS95 commented 4 weeks ago

Theoretically DEFAULT_DISPLAY is allowed, which simply creates a new socket as documented. But as also noted in "the specialty" linked by @Friz64, that wouldn't be compatible with a Surface created / "imported" from a different socket, hence this hack replaces the context in create_surface() :thinking:

On a side-note, it's equally strange that eglMakeCurrent() is called with self.pbuffer = None, from Device::desotry_buffer() (while it lock()s/currents the context) during startup.


All in all it looks like wgpu got EGL/WGL context handling wrong. There is a reason raw-window-handle separated out RawWindowHandle from RawDisplayHandle, yet wgpu somehow only passes the latter around when a surface is created? This makes it impossible to correctly initialize a context API that is relevant to the current display connection (set up by winit or otherwise) and requires ugly hacks like fn open_x_display(), fn test_wayland_display() and recreating/overwriting the context.

To whomever is planning to fix these bugs, I recommend starting by deleting the entire gles module and using glutin to implement the context API going forward. A good starting point is its example which clearly highlights important platform differences.


EDIT: Just above the hack there's a call to ANativeWindow_setBuffersGeometry(). I equally don't understand why that's necessary, as the config/surface internally will make sure the backing buffer format of the buffer producer is selected/updated. From some testing very long ago (this may have been wrong), calling this function limits the available EGL configurations for that surface on future queries.

BartBM commented 4 weeks ago

Accessing the raw texture for sharing via DMA-BUF would still be possible after rewriting the gles module? In this poc I export the raw WGPU OpenGL texture via DMA-BUF and use it in Slint or Glutin: https://github.com/BartBM/wgpu-dma-buf/blob/main/src/texture_export_wgpu.rs

Some poc code for what I am using it ```rust pub fn export_to_opengl_texture(texture: &Texture) -> Option { let mut native_texture: Option = None; unsafe { texture.as_hal::(|hal_texture| { if let Some(hal_texture) = hal_texture { if let wgpu_hal::gles::TextureInner::Texture { raw, target: _target } = hal_texture.inner { native_texture = Some(raw); } } }); } return native_texture; } ``` ```rust pub fn export_to_dma_buf(adapter: &wgpu::Adapter, native_texture: NativeTexture, width: u32, height: u32) -> (TextureStorageMetadata, RawFd) { unsafe { adapter.as_hal::(|hal_adapter| { match hal_adapter { Some(adapter) => { let raw_display = adapter.adapter_context().raw_display().unwrap().as_ptr(); let raw_context = adapter.adapter_context().raw_context(); let egl_instance = adapter.adapter_context().egl_instance().unwrap(); let fn_egl_create_image_khr = egl_instance.get_proc_address("eglCreateImageKHR"); let egl_create_image_khr = fn_egl_create_image_khr.expect("eglCreateImageKHR not present!"); let egl_function = std::mem::transmute_copy::<_, PFNEGLCREATEIMAGEKHRPROC>(&egl_create_image_khr); let egl_image_khr: EGLImageKHR = (egl_function.unwrap())( raw_display, raw_context, EGL_GL_TEXTURE_2D_KHR, native_texture.0.get() as EGLClientBuffer, std::ptr::null() ); assert!(!egl_image_khr.is_null(), "eglCreateImageKHR failed"); ... ```
MarijnS95 commented 4 weeks ago

@BartBM the gles module has to be rewritten from scratch is what I'm saying :)

Depending on what your needs are, DMA-BUF sharing is also possible with Vulkan. Note that your repository appears to be private, I cannot access it.

BartBM commented 4 weeks ago

@MarijnS95 yes rewriting it is what I meant. Slint only supports displaying textures via GL, not Vulkan: https://releases.slint.dev/1.7.0/docs/rust/slint/struct.borrowedopengltexturebuilder that's why I was asking.. The other option is copying data via a SharedPixelBuffer but this is slow.

grovesNL commented 4 weeks ago

I'm not sure if this is helpful, but for context some of the original workarounds were attempting to support surfaceless/headless GL contexts for initial queries, then being able to optionally associate the context with a surface afterwards through whatever context-specific APIs were available for that platform (e.g., context sharing). The implementation may have drifted over time but that was the original intent at least.

glutin and surfman didn't support this use case at the time, so wgpu would have needed a separate API for GL to fit glutin's API. Maybe glutin supports this now though?

Some other background:

MarijnS95 commented 4 weeks ago

@BartBM it doesn't look like that relies on DMA-BUF (otherwise they should take file descriptors and DRM modifiers?), but instead uses the given texture as a render target when Slint renders its output using OpenGL?

Note that its safety constraints - which describe soundness when other safe-to-call functions are called, or callbacks are invoked - are exactly the ones that @ErichDonGubler shot down in https://github.com/gfx-rs/wgpu/pull/6152#discussion_r1743878285 / https://github.com/gfx-rs/wgpu/issues/6211 🤭


@grovesNL Thanks for that backstory, yeah it seems quite common to create a dummy window or surface to get this done. New glutin design however shows that you can typically get away without it (though I'm incredibly glad APIs like Vulkan exist that don't require a "current context" with render target well before you're actually rendering anything).

Specifically on EGL, this might be tricky if the select EGLConfig is incompatible with a window/surface down the line, requiring you to create a new context. This context could be shared with the dummy context though, allowing them to share resources while their "current-ness" is independent. (Except that on at least wglShareLists() there are some extra constraints when calling).

EDIT: Being all honest, I'm still completely oblivious to wgpu's "async" design and would have to do quite some preliminary reading to understand how the GL context API limitations fit into this design. I.e. what needs to be queried/created up-front before a surface is known and a render pass can start?

BartBM commented 4 weeks ago

@MarijnS95 The slint::BorrowedOpenGLTextureBuilder can only display a texture when it is created in the same OpenGL context. That's why I needed access to the raw texture in WGPU, expose it using DMA-BUF and then import it in Slint. The context is different. The example code is in the link above.

grovesNL commented 4 weeks ago

@MarijnS95

understand how the GL context API limitations fit into this design. I.e. what needs to be queried/created up-front before a surface is known and a render pass can start?

It's similar to the kind of information you'd get during Vulkan initialization so you can select which device to use, e.g., driver information, features, limits (https://docs.rs/wgpu-hal/22.0.0/wgpu_hal/struct.ExposedAdapter.html)

MarijnS95 commented 1 week ago

@BartBM thanks for making the repository public so that we can have a look. Going through DMA-BUF is quite extensive and may only be necessary for interop between APIs and/or when transferring resources across process boundaries.

If all you need is sharing resources between two GL contexts inside the same process (especially Textures) OpenGL context sharing should be exactly what you need. For example GStreamer uses these extensively to share textures and buffers across the pipeline while still being able to (un)current contexts on different threads independently of each other.

This is currently not at all possible with wgpu (and even resulted in some strange internal hacks), and is why @ErichDonGubler opened #6211 to see if we can rectify that. Even though that issue is specific to the current "impossible" soundness requirements on some unsafe functions inside the GLES backend, I don't think we can fix it without properly reimplementing context initialization with support for shared contexts and basing things off of existing RawDisplayHandles for compatibility (etc).


@grovesNL thanks for the link - preliminary reading shows that there's no RawDisplayHandle in there at the moment, so we'd have to find a way to add it.