Open kelvie opened 1 year ago
Looks like it's in wlroots? assertion=0x56484c13b028 "wl_resource_instance_of(resource, &wl_surface_interface, &surface_implementation)", file=0x56484c137ce7 "types/wlr_surface.c", line=612, function=0x56484c13c160 "wlr_surface_from_resource") at assert.c:101
Using whatever debug symbols I can find from debuginfod.elfutils.org:
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Core was generated by `gamescope --generate-drm-mode fixed --xwayland-count 2 -w 1280 -h 800 --default'.
Program terminated with signal SIGABRT, Aborted.
#0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
44 return INTERNAL_SYSCALL_ERROR_P (ret) ? INTERNAL_SYSCALL_ERRNO (ret) : 0;
[Current thread is 1 (Thread 0x7f01a77fe6c0 (LWP 1233))]
(gdb) bt
#0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
ValveSoftware/steam-for-linux#1 0x00007f01c24f96b3 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
ValveSoftware/steam-for-linux#2 0x00007f01c24a9958 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
ValveSoftware/steam-for-linux#3 0x00007f01c249353d in __GI_abort () at abort.c:79
ValveSoftware/steam-for-linux#4 0x00007f01c249345c in __assert_fail_base
(fmt=0x7f01c260da50 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x56484c13b028 "wl_resource_instance_of(resource, &wl_surface_interface, &surface_implementation)", file=0x56484c137ce7 "types/wlr_surface.c", line=612, function=<optimized out>) at assert.c:92
ValveSoftware/steam-for-linux#5 0x00007f01c24a2486 in __GI___assert_fail
(assertion=0x56484c13b028 "wl_resource_instance_of(resource, &wl_surface_interface, &surface_implementation)", file=0x56484c137ce7 "types/wlr_surface.c", line=612, function=0x56484c13c160 "wlr_surface_from_resource") at assert.c:101
ValveSoftware/steam-for-linux#6 0x000056484c0c2d7a in ()
ValveSoftware/steam-for-linux#7 0x000056484c097eec in ()
ValveSoftware/steam-for-linux#8 0x000056484c09bbc7 in ()
ValveSoftware/steam-for-linux#9 0x00007f01c28382f3 in std::execute_native_thread_routine(void*) (__p=0x56484f261230) at /usr/src/debug/gcc/libstdc++-v3/src/c++11/thread.cc:82
ValveSoftware/steam-for-linux#10 0x00007f01c24f78fd in start_thread (arg=<optimized out>) at pthread_create.c:442
ValveSoftware/steam-for-linux#11 0x00007f01c2579a60 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
(gdb)
I can pretty reliably trigger this, if you can provide the debug symbols or a new gamescope binary/libraries (for steam deck version 3.4.2)
And looking at the code, it seems like the only function in wlroots that calls it like this is wlr_surface_from_resource
, which is called in 3 places:
gamescope_xwayland_server_t::set_wl_id
gamescope_xwayland_server_t::handle_override_window_content
gamescope_tearing_get_tearing_control
If I had to guess it'd be the handle_override_window_content
call 😁
Looks like it gets called in VkLayer_FROG_gamescope_wsi.cpp
, but I'd have to learn more about what these layers do, but presumably it's getting called with a non surface resource (and this is happening during of a creating of the keyboard overlay over the xwayland window that runs the game?)
Reading the handle_override_window_content code, it's probably not that -- the only place I see it being called, it creates a new surface, checks for NULL, then passes it into handle_override_window_content
, so it's probably a good surface.
Perhaps it's set_wl_id
, which is called here:
Maybe we need to check if the surface is valid before doing this?
This is being called in an X11 message handler:
Are there build instructions for how to build gamescope so that it works on steamOS? I tried building it myself from master, and SteamOS didn't launch for whatever reason. If I can reproduce the crash with debug symbols, I can dig deeper into this.
Ah, I was building the wrong branch. I built it from jupiter/3.4
, and reproduced this inside GDB.
(gdb) bt
u#0 0x00007f3c35d6e64c in () at /usr/lib/libc.so.6
#1 0x00007f3c35d1e958 in raise () at /usr/lib/libc.so.6
#2 0x00007f3c35d0853d in abort () at /usr/lib/libc.so.6
#3 0x00007f3c35d0845c in () at /usr/lib/libc.so.6
#4 0x00007f3c35d17486 in () at /usr/lib/libc.so.6
#5 0x00005634211e1cbf in wlr_surface_from_resource (resource=0x56342245db20) at ../subprojects/wlroots/types/wlr_surface.c:612
#6 0x000056342115f4ee in gamescope_xwayland_server_t::set_wl_id(wlserver_x11_surface_info*, unsigned int)
(this=0x563422841190, surf=0x7f3c140a87c8, id=80) at ../src/wlserver.cpp:1296
#7 0x000056342113d054 in handle_wl_surface_id(xwayland_ctx_t*, win*, uint32_t) (ctx=0x7f3c14000f30, w=0x7f3c140a86b0, surfaceID=80)
at ../src/steamcompmgr.cpp:3675
#8 0x000056342113d52f in handle_client_message(xwayland_ctx_t*, XClientMessageEvent*) (ctx=0x7f3c14000f30, ev=0x7f3c1bffe920)
at ../src/steamcompmgr.cpp:3803
#9 0x00005634211413a8 in dispatch_x11(xwayland_ctx_t*) (ctx=0x7f3c14000f30) at ../src/steamcompmgr.cpp:4978
#10 0x000056342114396a in steamcompmgr_main(int, char**) (argc=28, argv=0x7ffc926e1a78) at ../src/steamcompmgr.cpp:5567
#11 0x000056342115b7d9 in steamCompMgrThreadRun(int, char**) (argc=28, argv=0x7ffc926e1a78) at ../src/main.cpp:602
#12 0x000056342115bf21 in std::__invoke_impl<void, void (*)(int, char**), int, char**>(std::__invoke_other, void (*&&)(int, char**), int&&, char**&&) (__f=@0x5634227fbd18: 0x56342115b79f <steamCompMgrThreadRun(int, char**)>) at /usr/include/c++/12.2.0/bits/invoke.h:61
#13 0x000056342115be62 in std::__invoke<void (*)(int, char**), int, char**>(void (*&&)(int, char**), int&&, char**&&)
(__fn=@0x5634227fbd18: 0x56342115b79f <steamCompMgrThreadRun(int, char**)>) at /usr/include/c++/12.2.0/bits/invoke.h:96
#14 0x000056342115bd95 in std::thread::_Invoker<std::tuple<void (*)(int, char**), int, char**> >::_M_invoke<0ul, 1ul, 2ul>(std::_Index_tuple<0ul, 1ul, 2ul>) (this=0x5634227fbd08) at /usr/include/c++/12.2.0/bits/std_thread.h:252
#15 0x000056342115bd32 in std::thread::_Invoker<std::tuple<void (*)(int, char**), int, char**> >::operator()() (this=0x5634227fbd08)
at /usr/include/c++/12.2.0/bits/std_thread.h:259
#16 0x000056342115bd16 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)(int, char**), int, char**> > >::_M_run()
(this=0x5634227fbd00) at /usr/include/c++/12.2.0/bits/std_thread.h:210
#17 0x00007f3c360ad2f3 in std::execute_native_thread_routine(void*) (__p=0x5634227fbd00) at /usr/src/debug/gcc/libstdc++-v3/src/c++11/thread.cc:82
#18 0x00007f3c35d6c8fd in () at /usr/lib/libc.so.6
#19 0x00007f3c35deea60 in () at /usr/lib/libc.so.6
Looks like the resource
passed is:
(gdb) up 5
#5 0x00005634211e1cbf in wlr_surface_from_resource (resource=0x56342245db20) at ../subprojects/wlroots/types/wlr_surface.c:612
612 ../subprojects/wlroots/types/wlr_surface.c: No such file or directory.
(gdb) print resource
$1 = (struct wl_resource *) 0x56342245db20
(gdb) print *resource
$3 = {object = {interface = 0x5634212a0f20 <gamescope_surface_tearing_control_v1_interface>,
implementation = 0x5634212a1060 <surface_tearing_control_impl>, id = 80}, destroy = 0x0, link = {prev = 0x0, next = 0x0}, destroy_signal = {
listener_list = {prev = 0x56342245db50, next = 0x56342245db50}}, client = 0x5634225ef820, data = 0x5634217dddb0}
(gdb)
This is the check that gets asserted (and aborts):
https://chromium.googlesource.com/external/wayland/wayland/+/refs/heads/1.5/src/wayland-server.c#627
From the code, it looks like it's expecting a surface_tearing_control_impl
, and not a gamescope_surface_tearing_control_v1_interface
Digging at the handler, it seems this reacts to the WL_SURFACE_ID
message, which leads to this:
https://gitlab.freedesktop.org/xorg/xserver/-/issues/1157
I guess this is what's happening, and it appears it was fixed by https://gitlab.freedesktop.org/xorg/xserver/-/merge_requests/976 which was merged 3 months ago?
Oh, looking at that issue, it was reported by one of the gamescope devs, @emersion -- is this what we're seeing here?
OK, and even if we had the latest xserver, it looks like all it does is implement a new protocol, we still need to deprecate the use of WL_SURFACE_ID
in gamescope to avoid this race.
I am assuming you have Decky Loader or something installed on your Steam Deck which is the culprit that triggers this behaviour.
We should move to the new system in Gamescope, but the pieces have just landed.
@Joshua-Ashton I don't have decky loader or any of that nonsense, this is pretty stock. I do have one of the official keyboard themes though.
Hmmm, that's interesting then. We should try and ship the fix in Gamescope soon either way.
Thank you. Is there a workaround, like "wait for the keyboard to close for 5 seconds" or something? Presumably this is the keyboard window getting destroyed so fast that it's ID is getting re-used again right? (peeking at the CPU in top while SSH'd in, popping up the keyboard seems to use a lot of CPU).
If anyone else (like me) wants to play dwarf fortress on their steam deck for however much time off they have left, I put together a hack to make this happen less: https://github.com/kelvie/gamescope/releases/tag/jupiter-3.4-kelvie
Any progress with the fix? Just as a side note, my patch has not encountered the keyboard crash even once in many more hours of playing and popping the keyboard up and down, so it may be worth pushing it in the interim as it seems like this bug affects a lot of other people.
This issue has made on-screen keyboard completely unusable in games for me, which in turn made many games unplayable. So yes, it would be great to have even a hacky workaround fix.
I think I see the changes have landed on master
-- is there any indication when this will land in a steam deck update? I do understand there are a bunch of upstream dependencies that need to be sorted first.
More details (2 core dumps attached to the steam-for-linux ticket):
https://github.com/ValveSoftware/SteamOS/issues/945
Here's a stack trace https://gist.github.com/kelvie/8ccffb3bddf53c6bbf7618295b789b96
It seems It's happening to more than just me:
https://old.reddit.com/r/SteamDeck/comments/zv5sys/anyone_else_getting_a_full_system_crash_when/