Closed vanfanel closed 4 months ago
Alright, so why is this a bug in Wayfire and not in those programs you mentioned?
Alright, so why is this a bug in Wayfire and not in those programs you mentioned?
Because it doesn't happen at all on other WLRoots-based compositors, like Sway: monitor can be repeatedly power-cycled without any problems or crashes.
I have added that information to the first post.
Wayfire creates a temporary output when all the other outputs are disconnected and this could be the reason why the apps crash. However it could still be their bug :)
I would need more information in order to work on a potential fix, if this is a bug in Wayfire at all. Like, you or someone else would need to figure out what we're doing wrong so that the apps crash. What stacktrace(s) do they crash with? What is the output of WAYLAND_DEBUG=1 <crashing app>
when you reproduce the bug?
@ammen99 This is a debug session of a crashing SDL2 app as I turn ON monitor (any SDL2 program will do), on a DEBUG Wayfire build:
(gdb) r
Starting program: /root/prince/prince.exe
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff7a006c0 (LWP 2596)]
[New Thread 0x7fffea8006c0 (LWP 2597)]
[New Thread 0x7fffe9e006c0 (LWP 2598)]
[New Thread 0x7fffe94006c0 (LWP 2599)]
[New Thread 0x7fffe8a006c0 (LWP 2600)]
[New Thread 0x7fffe3c006c0 (LWP 2601)]
[New Thread 0x7fffe32006c0 (LWP 2602)]
Thread 1 "prince.exe" received signal SIGSEGV, Segmentation fault.
0x00007ffff7b097e4 in prepare_zombie.isra ()
from /usr/local/lib/x86_64-linux-gnu/libwayland-client.so.0
(gdb) bt
#0 0x00007ffff7b097e4 in prepare_zombie.isra ()
from /usr/local/lib/x86_64-linux-gnu/libwayland-client.so.0
#1 0x00007ffff7b09ec8 in wl_proxy_destroy ()
from /usr/local/lib/x86_64-linux-gnu/libwayland-client.so.0
#2 0x00007ffff7eedf15 in display_remove_global () from /usr/local/lib/libSDL2-2.0.so.0
#3 0x00007ffff7afdf7a in ?? () from /lib/x86_64-linux-gnu/libffi.so.8
#4 0x00007ffff7afd40e in ?? () from /lib/x86_64-linux-gnu/libffi.so.8
#5 0x00007ffff7afdb0d in ffi_call () from /lib/x86_64-linux-gnu/libffi.so.8
#6 0x00007ffff7b0e44c in wl_closure_invoke ()
from /usr/local/lib/x86_64-linux-gnu/libwayland-client.so.0
#7 0x00007ffff7b09c4f in dispatch_event ()
from /usr/local/lib/x86_64-linux-gnu/libwayland-client.so.0
#8 0x00007ffff7b0adfb in wl_display_dispatch_queue_pending ()
from /usr/local/lib/x86_64-linux-gnu/libwayland-client.so.0
#9 0x00007ffff7ee9cda in Wayland_PumpEvents () from /usr/local/lib/libSDL2-2.0.so.0
#10 0x00007ffff7e2e133 in SDL_PumpEventsInternal () from /usr/local/lib/libSDL2-2.0.so.0
#11 0x00007ffff7e2e44a in SDL_WaitEventTimeout_REAL () from /usr/local/lib/libSDL2-2.0.so.0
#12 0x0000555555579db0 in process_events ()
#13 0x000055555557be4f in do_simple_wait ()
#14 0x00005555555650c9 in play_level_2 ()
#15 0x00005555555653f7 in play_level ()
#16 0x00005555555655c9 in init_game ()
#17 0x000055555555d80e in start_game ()
#18 0x000055555555f9a3 in pop_main ()
#19 0x000055555555a9d6 in main ()
If you need me to build any lower libs in DEBUG mode, please tell me and I will do so.
And this is what I get with WAYLAND_DEBUG=1 <crashing app>
:
Hope it helps. If you need me to do further experiments, please tell me.
Any chance of getting debug symbols from SDL and libwayland-client.so ? Or at least tell me which version of sdl2 you are using so that I can see what display_remove_global in SDL2 does.
Looking at the wayland log though I don't see anything which Wayfire could be doing wrong so I suspect it is a client bug.
@ammen99 This is the GDB bt with debug builds of both latest stable libwayland (wayland-1.23.0) and latest stable libSDL2 (SDL2-2.30.4):
Thread 1 "prince.exe" received signal SIGSEGV, Segmentation fault.
0x00007ffff7f7cde6 in prepare_zombie (proxy=0x55555567c1e0)
at ../src/wayland-client.c:443
443 for (i = 0; i < interface->event_count; i++) {
(gdb) bt
#0 0x00007ffff7f7cde6 in prepare_zombie (proxy=0x55555567c1e0) at ../src/wayland-client.c:443
#1 0x00007ffff7f7d0ab in proxy_destroy (proxy=0x55555567c1e0) at ../src/wayland-client.c:570
#2 0x00007ffff7f7d189 in wl_proxy_destroy_caller_locks (proxy=0x55555567c1e0)
at ../src/wayland-client.c:598
#3 0x00007ffff7f7d1c2 in wl_proxy_destroy (proxy=0x55555567c1e0) at ../src/wayland-client.c:621
#4 0x00007ffff7de995e in wl_output_destroy (wl_output=0x55555567c1e0)
at gen/wayland-client-protocol.h:5726
#5 0x00007ffff7deb7de in Wayland_free_display (d=0x555555623af0, id=39)
at /root/src/sdl/SDL2-2.30.4/src/video/wayland/SDL_waylandvideo.c:758
#6 0x00007ffff7debf11 in display_remove_global (data=0x555555623af0, registry=0x555555624d80,
id=39) at /root/src/sdl/SDL2-2.30.4/src/video/wayland/SDL_waylandvideo.c:887
#7 0x00007ffff7f70f7a in ?? () from /lib/x86_64-linux-gnu/libffi.so.8
#8 0x00007ffff7f7040e in ?? () from /lib/x86_64-linux-gnu/libffi.so.8
#9 0x00007ffff7f70b0d in ffi_call () from /lib/x86_64-linux-gnu/libffi.so.8
#10 0x00007ffff7f81f50 in wl_closure_invoke (closure=0x55555682e0e0, flags=1,
target=0x555555624d80, opcode=1, data=0x555555623af0) at ../src/connection.c:1228
#11 0x00007ffff7f7eabc in dispatch_event (display=0x55555561f850, queue=0x55555561f948)
at ../src/wayland-client.c:1670
#12 0x00007ffff7f7eda2 in dispatch_queue (display=0x55555561f850, queue=0x55555561f948)
at ../src/wayland-client.c:1816
#13 0x00007ffff7f7f055 in wl_display_dispatch_queue_pending (display=0x55555561f850,
queue=0x55555561f948) at ../src/wayland-client.c:2058
#14 0x00007ffff7f7f0bd in wl_display_dispatch_pending (display=0x55555561f850)
at ../src/wayland-client.c:2121
#15 0x00007ffff7de212d in Wayland_PumpEvents (_this=0x555555623d30)
at /root/src/sdl/SDL2-2.30.4/src/video/wayland/SDL_waylandevents.c:380
#16 0x00007ffff7c58616 in SDL_PumpEventsInternal (push_sentinel=SDL_TRUE)
--Type <RET> for more, q to quit, c to continue without paging--
at /root/src/sdl/SDL2-2.30.4/src/events/SDL_events.c:918
#17 0x00007ffff7c589ef in SDL_WaitEventTimeout_REAL (event=0x7fffffffe0d0, timeout=0)
at /root/src/sdl/SDL2-2.30.4/src/events/SDL_events.c:1093
#18 0x00007ffff7c586e6 in SDL_PollEvent_REAL (event=0x7fffffffe0d0)
at /root/src/sdl/SDL2-2.30.4/src/events/SDL_events.c:960
#19 0x00007ffff7c4aa90 in SDL_PollEvent (a=0x7fffffffe0d0)
at /root/src/sdl/SDL2-2.30.4/src/dynapi/SDL_dynapi_procs.h:156
#20 0x0000555555579db0 in process_events ()
#21 0x000055555557c3ba in do_wait ()
#22 0x000055555555ea48 in show_title ()
#23 0x000055555555d827 in start_game ()
#24 0x000055555555f9a3 in pop_main ()
#25 0x000055555555a9d6 in main ()
I think this is what you asked me for, if you need anything more, ask me. I really want to help fixing this.
Also, since I am using SDL2 2.30.4, this exactly is what display_remove_global
is doing:
It's simply calling wayland_free_display
, which is here:
I truly hope it helps. If you need anything else, as I said, just ask me. I'll be leaving debug versions of libwayland, wayfire and SDL2 for now, in case you need me to look at anything else.
I highly recommend updating sdl2, I looked at their git and there were many changes in this part of the code.
@ammen99 If you're looking at the main
branch, you're looking at SDL3, which is a non-released SDL version with many API changes (not directly compatible with SDL2 programs at source level).
Looking at the SDL2
branch here (which is what current SDL2 games and programs use):
https://github.com/libsdl-org/SDL/tree/SDL2
...you can see by looking at https://github.com/libsdl-org/SDL/commits/SDL2/src/video/wayland/SDL_waylandvideo.c thatSDL_waylandvideo.c
hasn't changed in three months now :(
Ah well, I was looking at SDL3. They have the correct code for this, using wl_output.release instead of wl_output.destroy. Can you try locally patching SDL2 to use wl_output_release instead of wl_output_destroy?
Oh,, they bind the interface at version 2. You'd also have to bind wl_output at version 3 in the following line here:
output = wl_registry_bind(d->registry, id, &wl_output_interface, 2);
This is line 696.
@vanfanel I am looking over the SDL code and I cannot understand how it is supposed to work, does it not contain a double free bug? Looking at this here:
https://github.com/libsdl-org/SDL/blob/SDL2/src/video/wayland/SDL_waylandvideo.c#L764
This frees the driverdata
:
https://github.com/libsdl-org/SDL/blob/SDL2/src/video/SDL_video.c#L681
Which is the same as the data pointer here:
https://github.com/libsdl-org/SDL/blob/SDL2/src/video/wayland/SDL_waylandvideo.c#L741
Which is used after the free call:
https://github.com/libsdl-org/SDL/blob/SDL2/src/video/wayland/SDL_waylandvideo.c#L765-L769
Is this not a use-after-free? You can verify this by running the game or really any SDL demo app with address sanitizer.
Ah well, I was looking at SDL3. They have the correct code for this, using wl_output.release instead of wl_output.destroy. Can you try locally patching SDL2 to use wl_output_release instead of wl_output_destroy?
After doing this change and binding wl_output at version 3 as you said, it's still crashing:
Thread 1 "prince.exe" received signal SIGSEGV, Segmentation fault.
wl_proxy_marshal_flags (proxy=0x55555575a2b0, opcode=0, interface=0x0, version=0, flags=1)
at ../src/wayland-client.c:853
853 wl_argument_from_va_list(proxy->object.interface->methods[opcode].signature,
(gdb) bt
#0 wl_proxy_marshal_flags (proxy=0x55555575a2b0, opcode=0, interface=0x0, version=0, flags=1)
at ../src/wayland-client.c:853
#1 0x00007ffff7de99ad in wl_output_release (wl_output=0x55555575a2b0)
at gen/wayland-client-protocol.h:5738
#2 0x00007ffff7deb831 in Wayland_free_display (d=0x555555623af0, id=41)
at /root/src/sdl/SDL2-2.30.4/src/video/wayland/SDL_waylandvideo.c:762
#3 0x00007ffff7debf64 in display_remove_global (data=0x555555623af0, registry=0x555555624d80,
id=41) at /root/src/sdl/SDL2-2.30.4/src/video/wayland/SDL_waylandvideo.c:892
#4 0x00007ffff7f70f7a in ?? () from /lib/x86_64-linux-gnu/libffi.so.8
#5 0x00007ffff7f7040e in ?? () from /lib/x86_64-linux-gnu/libffi.so.8
#6 0x00007ffff7f70b0d in ffi_call () from /lib/x86_64-linux-gnu/libffi.so.8
#7 0x00007ffff7f81f50 in wl_closure_invoke (closure=0x5555568234e0, flags=1,
target=0x555555624d80, opcode=1, data=0x555555623af0) at ../src/connection.c:1228
#8 0x00007ffff7f7eabc in dispatch_event (display=0x55555561f850, queue=0x55555561f948)
at ../src/wayland-client.c:1670
#9 0x00007ffff7f7eda2 in dispatch_queue (display=0x55555561f850, queue=0x55555561f948)
at ../src/wayland-client.c:1816
#10 0x00007ffff7f7f055 in wl_display_dispatch_queue_pending (display=0x55555561f850,
queue=0x55555561f948) at ../src/wayland-client.c:2058
#11 0x00007ffff7f7f0bd in wl_display_dispatch_pending (display=0x55555561f850)
at ../src/wayland-client.c:2121
#12 0x00007ffff7de212d in Wayland_PumpEvents (_this=0x555555623d30)
at /root/src/sdl/SDL2-2.30.4/src/video/wayland/SDL_waylandevents.c:380
#13 0x00007ffff7c58616 in SDL_PumpEventsInternal (push_sentinel=SDL_TRUE)
at /root/src/sdl/SDL2-2.30.4/src/events/SDL_events.c:918
#14 0x00007ffff7c589ef in SDL_WaitEventTimeout_REAL (event=0x7fffffffe0d0, timeout=0)
at /root/src/sdl/SDL2-2.30.4/src/events/SDL_events.c:1093
--Type <RET> for more, q to quit, c to continue without paging--
#15 0x00007ffff7c586e6 in SDL_PollEvent_REAL (event=0x7fffffffe0d0)
at /root/src/sdl/SDL2-2.30.4/src/events/SDL_events.c:960
#16 0x00007ffff7c4aa90 in SDL_PollEvent (a=0x7fffffffe0d0)
at /root/src/sdl/SDL2-2.30.4/src/dynapi/SDL_dynapi_procs.h:156
#17 0x0000555555579db0 in process_events ()
#18 0x000055555557c3ba in do_wait ()
#19 0x000055555555ea48 in show_title ()
#20 0x000055555555d827 in start_game ()
#21 0x000055555555f9a3 in pop_main ()
#22 0x000055555555a9d6 in main ()
Now let me try that address sanitizer thing (I have to investigate how that works)
@ammen99 Here's the address sanitized log of the example game running with address sanitizer support built into SDL2 and the game itself:
~/prince$ LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libasan.so.8 prince.exe
=================================================================
==13225==ERROR: AddressSanitizer: heap-use-after-free on address 0x611000191e50 at pc 0x74e41c025510 bp 0x7fff094664b0 sp 0x7fff094664a8
READ of size 8 at 0x611000191e50 thread T0
#0 0x74e41c02550f in Wayland_free_display /root/src/sdl/SDL2-2.30.4/src/video/wayland/SDL_waylandvideo.c:757
#1 0x74e41c0261c3 in display_remove_global /root/src/sdl/SDL2-2.30.4/src/video/wayland/SDL_waylandvideo.c:892
#2 0x74e41b7e8f79 (/lib/x86_64-linux-gnu/libffi.so.8+0x6f79)
#3 0x74e41b7e840d (/lib/x86_64-linux-gnu/libffi.so.8+0x640d)
#4 0x74e41b7e8b0c in ffi_call (/lib/x86_64-linux-gnu/libffi.so.8+0x6b0c)
#5 0x74e41b7f9f4f in wl_closure_invoke ../src/connection.c:1228
#6 0x74e41b7f6abb in dispatch_event ../src/wayland-client.c:1670
#7 0x74e41b7f6da1 in dispatch_queue ../src/wayland-client.c:1816
#8 0x74e41b7f7054 in wl_display_dispatch_queue_pending ../src/wayland-client.c:2058
#9 0x74e41b7f70bc in wl_display_dispatch_pending ../src/wayland-client.c:2121
#10 0x74e41c00b57f in Wayland_PumpEvents /root/src/sdl/SDL2-2.30.4/src/video/wayland/SDL_waylandevents.c:380
#11 0x74e41bc88d1f in SDL_PumpEventsInternal /root/src/sdl/SDL2-2.30.4/src/events/SDL_events.c:918
#12 0x74e41bc893b0 in SDL_WaitEventTimeout_REAL /root/src/sdl/SDL2-2.30.4/src/events/SDL_events.c:1093
#13 0x74e41bc88e76 in SDL_PollEvent_REAL /root/src/sdl/SDL2-2.30.4/src/events/SDL_events.c:960
#14 0x74e41bc76026 in SDL_PollEvent /root/src/sdl/SDL2-2.30.4/src/dynapi/SDL_dynapi_procs.h:156
#15 0x5b42e42b70e7 in process_events /root/src/SDLPoP-1.23/src/seg009.c:3353
#16 0x5b42e42baa38 in idle /root/src/SDLPoP-1.23/src/seg009.c:3698
#17 0x5b42e425ee3c in show_title /root/src/SDLPoP-1.23/src/seg000.c:1908
#18 0x5b42e425a954 in start_game /root/src/SDLPoP-1.23/src/seg000.c:235
#19 0x5b42e426163e in pop_main /root/src/SDLPoP-1.23/src/seg000.c:149
#20 0x5b42e4252e62 in main /root/src/SDLPoP-1.23/src/main.c:27
#21 0x74e41b846249 (/lib/x86_64-linux-gnu/libc.so.6+0x27249)
#22 0x74e41b846304 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x27304)
#23 0x5b42e4253270 in _start (/root/prince/prince.exe+0x2c270)
0x611000191e50 is located 16 bytes inside of 224-byte region [0x611000191e40,0x611000191f20)
freed by thread T0 here:
#0 0x74e41c6b76a8 in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:52
#1 0x74e41bdbaa1a in real_free /root/src/sdl/SDL2-2.30.4/src/stdlib/SDL_malloc.c:5199
#2 0x74e41bdbae48 in SDL_free_REAL /root/src/sdl/SDL2-2.30.4/src/stdlib/SDL_malloc.c:5339
#3 0x74e41bf3407d in SDL_DelVideoDisplay /root/src/sdl/SDL2-2.30.4/src/video/SDL_video.c:676
#4 0x74e41c0254ea in Wayland_free_display /root/src/sdl/SDL2-2.30.4/src/video/wayland/SDL_waylandvideo.c:756
#5 0x74e41c0261c3 in display_remove_global /root/src/sdl/SDL2-2.30.4/src/video/wayland/SDL_waylandvideo.c:892
#6 0x74e41b7e8f79 (/lib/x86_64-linux-gnu/libffi.so.8+0x6f79)
previously allocated by thread T0 here:
#0 0x74e41c6b89cf in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:69
#1 0x74e41bdba9b6 in real_malloc /root/src/sdl/SDL2-2.30.4/src/stdlib/SDL_malloc.c:5196
#2 0x74e41bdbad2d in SDL_malloc_REAL /root/src/sdl/SDL2-2.30.4/src/stdlib/SDL_malloc.c:5295
#3 0x74e41c024f61 in Wayland_add_display /root/src/sdl/SDL2-2.30.4/src/video/wayland/SDL_waylandvideo.c:702
#4 0x74e41c0258e5 in display_handle_global /root/src/sdl/SDL2-2.30.4/src/video/wayland/SDL_waylandvideo.c:837
#5 0x74e41b7e8f79 (/lib/x86_64-linux-gnu/libffi.so.8+0x6f79)
SUMMARY: AddressSanitizer: heap-use-after-free /root/src/sdl/SDL2-2.30.4/src/video/wayland/SDL_waylandvideo.c:757 in Wayland_free_display
Shadow bytes around the buggy address:
0x0c228002a370: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
0x0c228002a380: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c228002a390: fd fd fd fd fa fa fa fa fa fa fa fa fa fa fa fa
0x0c228002a3a0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c228002a3b0: fd fd fd fd fd fd fd fd fd fd fa fa fa fa fa fa
=>0x0c228002a3c0: fa fa fa fa fa fa fa fa fd fd[fd]fd fd fd fd fd
0x0c228002a3d0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c228002a3e0: fd fd fd fd fa fa fa fa fa fa fa fa fa fa fa fa
0x0c228002a3f0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c228002a400: fd fd fd fd fd fd fd fd fd fd fa fa fa fa fa fa
0x0c228002a410: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==13225==ABORTING
Is this what you wanted to see?
Yes exactly, the bug is as I described in my previous comment. Feel free to let the sdl devs know about it, just give them the asan report and my explanation and they'll fix it :)
@ammen99 Ah, neat! I will let them know!
Describe the bug Having an SDL2 program running and turning the monitor off, then on again, the program always crashes. Happens with any SDL2 program, be it windowed or fullscreen. RetroArch doesn't seem to be affected at all.
SDL2 programs survive monitor power-cycling without any problems on other WLRoots-based compositors like Sway.
To Reproduce Steps to reproduce the behavior:
Expected behavior Turning monitor off, then on again, should not crash SDL2 programs.
Screenshots or stacktrace
The SDL2 programs DO survive monitor being turned off: what crashes them is turning the monitor back on. Here's what appears when I launch an example SDL2 program (SDLPop):
When the monitor is turned off, there are no messages on the console. The program is still running while the monitor is off.
And this is what appears on the console from which Wayfire is run, when monitor is turned on again (after being turned off):
Wayfire version Latest GIT code.