Smithay / smithay

A smithy for rusty wayland compositors
MIT License
1.75k stars 152 forks source link

segfault when enabling EGL hardware-acceleration and stopping the wayland loop #1443

Open m4rch3n1ng opened 1 month ago

m4rch3n1ng commented 1 month ago

trying to spawn smallvil with --comannd "kitty" (or alacritty) doesn't actually spawn them, but instead gives me this error

image

i found out that to fix that, i have to enable EGL hardware-acceleration like i did in https://github.com/m4rch3n1ng/smithay/commit/0d083cfabbf1a78a1e72546e13631ae1cf43f0be.

now, kitty and alacritty spawn just fine, but closing the winit window will now cause a segfault:

image image

for me this also happens in my own compositor and in niri, but i can't get it to reproduce in anvil and i don't know how to fix it.

i don't know where exactly this issue lays, but since i don't actually use unsafe there, i don't think it should segfault.

the segfault happens both when closing the winit window using toplevel.send_close() and stopping the event loop using loop_signal.stop(), with a winit and a udev backend and in both release and debug mode.

YaLTeR commented 1 month ago

From what I've noticed, it happens when you did renderer.bind_wl_display() and have an EGL client running when closing the compositor. A dmabuf client does not cause segfaults, which is likely why it doesn't repro in anvil (which has dmabuf feedback implemented in winit).

Not unlikely that this has something to do with drop order, but I'm pretty sure I tried different drop orders a while ago and they all caused this segfault.

m4rch3n1ng commented 1 month ago

it happens when you did renderer.bind_wl_display() and have an EGL client running when closing the compositor

that's the most consistent way i can reproduce this segfault in my own compositor too, opening weston-terminal and closing the compositor doesn't cause the segfault, but with kitty it does, but in the reproduction with smallvil i did (https://github.com/m4rch3n1ng/smithay/commit/8ad04943532ca6c7d0e1af2ca8c786b66bff0442) it also segfaults with with weston-terminal and even when not opening anything at all, so it's even weirder there

m4rch3n1ng commented 4 weeks ago

i don't know how helpful that is, but you can "fix" (it doesn't happen anymore) the segfault by leaking a reference to the DisplayHandle, so you are probably correct with the drop order being at fault.

diff --git a/smallvil/src/main.rs b/smallvil/src/main.rs
index 6c4c46e33f..78e0f7aaea 100644
--- a/smallvil/src/main.rs
+++ b/smallvil/src/main.rs
@@ -29,6 +29,10 @@ fn main() -> Result<(), Box<dyn std::error::Error>> {

     let display: Display<Smallvil> = Display::new()?;
     let display_handle = display.handle();
+
+    let _display_handle = display_handle.clone();
+    let _leak = Box::leak(Box::new(_display_handle));
+
     let state = Smallvil::new(&mut event_loop, display);

     let mut data = CalloopData {

i don't know how to change the drop order, so i'm sorry for not being able to provide more, but i'll try to get back to it some other time i guess

m4rch3n1ng commented 2 weeks ago

did some more testing, and found out that the segfault in my smallvil reproduction example does not happen at the same place as in mayland and niri.

using rust-gdb the segfault in smallvil happens at https://github.com/Smithay/smithay/blob/3731734d5a0409186a0475238ffe46e2835cee6d/src/backend/egl/display.rs#L1228.

you can properly fix the segfault by dropping the event_loop before the data is dropped.

diff --git a/smallvil/src/main.rs b/smallvil/src/main.rs
index 6c4c46e33f..d3ec176df3 100644
--- a/smallvil/src/main.rs
+++ b/smallvil/src/main.rs
@@ -55,5 +55,7 @@ fn main() -> Result<(), Box<dyn std::error::Error>> {
         // Smallvil is running
     })?;

+    drop(event_loop);
+
     Ok(())
 }

in niri and in my own compositor the segfault happens at https://github.com/Smithay/wayland-rs/blob/48e74e9ef497d62b770b9b862ee4cb0876dc2494/wayland-backend/src/sys/server_impl/mod.rs#L434

and dropping the event_loop earlier doesn't do anything.

m4rch3n1ng commented 2 weeks ago

i did a bisection on my own compositor when the segfault first appears in https://github.com/m4rch3n1ng/mayland/commit/85c9e6659f4be646844f729e932d4f7bb7922d46 it also happens at https://github.com/Smithay/smithay/blob/3731734d5a0409186a0475238ffe46e2835cee6d/src/backend/egl/display.rs#L1228, but when i drop the event_loop earlier it doesn't fix it, instead it happens where it happens now.

a commit later, after a refactor with winit in https://github.com/m4rch3n1ng/mayland/commit/e1d90466d6b6608e6877c3f3d383f4e3674afadb it just happens at https://github.com/Smithay/wayland-rs/blob/48e74e9ef497d62b770b9b862ee4cb0876dc2494/wayland-backend/src/sys/server_impl/mod.rs#L434 now and hasn't changed since.