ValveSoftware / gamescope

SteamOS session compositing window manager
Other
3.17k stars 213 forks source link

Decky Loader XWayland Surface Instability #613

Open Sterophonick opened 2 years ago

Sterophonick commented 2 years ago

Hello!

On my Steam Deck, I've been experiencing various crashes on Steam Deck when in Game Mode. When these happen, the currently running game stops, and then gamescope and Steam restarts. The power is not cut, as bluetooth devices remain connected.

Most commonly, these crashes occur when using the in-game overlays, but there are a couple of cases where it happened while playing a game (shown in attached video).

I have run a memtest86 and everything came back as good.

https://user-images.githubusercontent.com/34801996/188516428-b7dc11fe-0122-492a-b4cd-fba37ce48f6e.mp4

i am so good at doom that i crash my deck

Right here, I have some crash dumps and backtraces from two crashes from using the overlays.

(PID: 1142) gamescope_1142_bt.log gamescope_1142_info.log gamescope_1142.zip (dump)

(PID: 3667) gamescope_3667_bt.log gamescope_3667_info.log gamescope_3667.zip (dump)

My Steam Deck is currently on SteamOS 3.3.1 (20220817.1), and the problem, while uncommon, seems to persist even after switching OS branches or even refreshing the OS.

Sterophonick commented 2 years ago

Got another crash after opening the on-screen keyboard a handful of times.

gamescope_13003_bt.log gamescope_13003_info.log gamescope_13003.zip

Encountered on SteamOS 3.3.1 (20220812.101)

1basti1 commented 2 years ago

Hello, I don't have any logs. Because I wouldn't know where to find them. I'm just an end user.

Sooo, since 3 weeks or so, I also experience odd crashes. Mostly when I'm ingame and close quick settings. I can't reproduce it. It's completely random. (happend maybe 4 times in these 3 weeks now in maybe ~30h of gaming) The deck restarts automatically, but game is obviously closed.

I also have sometimes random Blackscreens in the game mode main menu, it's there, I click something and random blackscreen, controls and sound still working. Good thing i remembered how to restart. because all good after a restart.

Sterophonick commented 2 years ago

Hello, I don't have any logs. Because I wouldn't know where to find them. I'm just an end user.

Sooo, since 3 weeks or so, I also experience odd crashes. Mostly when I'm ingame and close quick settings. I can't reproduce it. It's completely random. (happend maybe 4 times in these 3 weeks now in maybe ~30h of gaming) The deck restarts automatically, but game is obviously closed.

I also have sometimes random Blackscreens in the game mode main menu, it's there, I click something and random blackscreen, controls and sound still working. Good thing i remembered how to restart. because all good after a restart.

For me, it's happened way more often. I know I'm on an outdated build of the OS, but going on the Main branch leads to the integrated controllers hitching every so often and it maked shooters unplayable for me.

1basti1 commented 2 years ago

Hello, I don't have any logs. Because I wouldn't know where to find them. I'm just an end user. Sooo, since 3 weeks or so, I also experience odd crashes. Mostly when I'm ingame and close quick settings. I can't reproduce it. It's completely random. (happend maybe 4 times in these 3 weeks now in maybe ~30h of gaming) The deck restarts automatically, but game is obviously closed. I also have sometimes random Blackscreens in the game mode main menu, it's there, I click something and random blackscreen, controls and sound still working. Good thing i remembered how to restart. because all good after a restart.

For me, it's happened way more often. I know I'm on an outdated build of the OS, but going on the Main branch leads to the integrated controllers hitching every so often and it maked shooters unplayable for me.

I'm on the latest beta build (iirc this all started with the current beta update, but could be wrong)

What do you mean controllers hitching? Because I don't have any problems. At least I don't notice anything.

I know there were performance problems, after the stable build got 3.3 but these are ok now

Sterophonick commented 2 years ago

This only happens with the integrated controller, but the problem is how every so often the state of the controller seizes up for a split second. I have proof of this.

Integrated test: https://youtu.be/ToFnC9TDkbo DualSense test: https://youtu.be/49RnOXsGJc0 Trackpad demonstration: https://youtu.be/zU54BJ7IYGA

Notice how in the trackpad demonstration thr pointer freezes up?

1basti1 commented 2 years ago

Oh, I see. But I don't think I have that. I would notice it. I'm so sensible to even small frame time stutters. Strange indeed.

I would need to test it. Maybe I'll later.

Sterophonick commented 2 years ago

This only happens under Main (20220830.1000) so I have no clue what the deal is. I'm kinda just waiting out SteamOS 3.4 and hoping for a fix. I don't really know who I can talk to at Valve about this.

failzers commented 2 years ago

Have been experiencing the same exact issue.

Sterophonick commented 2 years ago

gamescope_12923_info.log gamescope_12923_bt.log gamescope_12923.zip

So I tested it under 20220912 (currently under the Main branch) and I decided to record what happened when it crashed.

Video link

Edit: still happens on 20220914.1000

Sterophonick commented 2 years ago

Seems to be fixed as of https://github.com/Plagman/gamescope/commit/7b51f5964df1d2aa810428871d9d16e860f092df.

With it, I couldn't replicate the crash in either video. Keeping this open in case something happens though.

Update: Not fixed, false alarm.

1basti1 commented 2 years ago

How do you get that? Automatically?

Sterophonick commented 2 years ago

sudo pacman -Syu

also i just got it to trigger again by accident, so not fixed. blegh.

Sterophonick commented 2 years ago

Update: It appears to be caused by Decky Loader (https://github.com/SteamDeckHomebrew/decky-loader)

failzers commented 2 years ago

Update: It appears to be caused by Decky Loader (https://github.com/SteamDeckHomebrew/decky-loader)

It isn't as far as they're aware. People in their discord have spoken about encountering it without it installed, and I've had a couple of first hand encounters with people who've never had it installed, ever.

misyltoad commented 2 years ago

It's definitely related to Decky, I was in a VC with @Sterophonick and they were going back and forth with it enabled/disabled several times, and it only reproduced with it enabled. It's definitely caused by that.

That's not to say its the root cause, or just surfacing an existing problem or something. We probably shouldn't be crashing from a client either way.

The backtrace is very strange, there's a wlr_surface with a bad vtable (doesn't match surface implementation), which causes a crash when setting up the wl_id.

#0  0x00007fbad0ffad22 in raise () at /usr/lib/libc.so.6
#1  0x00007fbad0fe4862 in abort () at /usr/lib/libc.so.6
#2  0x00007fbad0fe4747 in _nl_load_domain.cold () at /usr/lib/libc.so.6
#3  0x00007fbad0ff3616 in  () at /usr/lib/libc.so.6
#4  0x0000556345db9290 in wlr_surface_from_resource (resource=0x556346b39d60) at ../subprojects/wlroots/types/wlr_surface.c:612
#5  0x0000556345d3a36e in gamescope_xwayland_server_t::set_wl_id(wlserver_x11_surface_info*, unsigned int) (this=0x5563469e4c40, surf=0x7fbabe37dfa8, id=56)
    at ../src/wlserver.cpp:1210
#6  0x0000556345d18e27 in handle_wl_surface_id(xwayland_ctx_t*, win*, uint32_t) (ctx=0x7fbabc0772b0, w=0x7fbabe37de90, surfaceID=56)
    at ../src/steamcompmgr.cpp:3569
#7  0x0000556345d19302 in handle_client_message(xwayland_ctx_t*, XClientMessageEvent*) (ctx=0x7fbabc0772b0, ev=0x7fbab77fd8a0)
    at ../src/steamcompmgr.cpp:3697
#8  0x0000556345d1cee4 in dispatch_x11(xwayland_ctx_t*) (ctx=0x7fbabc0772b0) at ../src/steamcompmgr.cpp:4827
#9  0x0000556345d1f1e9 in steamcompmgr_main(int, char**) (argc=28, argv=0x7ffc77d947b8) at ../src/steamcompmgr.cpp:5373
#10 0x0000556345d36a1a in steamCompMgrThreadRun(int, char**) (argc=28, argv=0x7ffc77d947b8) at ../src/main.cpp:578
#11 0x0000556345d37187 in std::__invoke_impl<void, void (*)(int, char**), int, char**>(std::__invoke_other, void (*&&)(int, char**), int&&, char**&&) (__f=
    @0x556346c493b8: 0x556345d369e0 <steamCompMgrThreadRun(int, char**)>) at /usr/include/c++/11.1.0/bits/invoke.h:61
#12 0x0000556345d3709e in std::__invoke<void (*)(int, char**), int, char**>(void (*&&)(int, char**), int&&, char**&&) (__fn=
    @0x556346c493b8: 0x556345d369e0 <steamCompMgrThreadRun(int, char**)>) at /usr/include/c++/11.1.0/bits/invoke.h:96
#13 0x0000556345d36fd1 in std::thread::_Invoker<std::tuple<void (*)(int, char**), int, char**> >::_M_invoke<0ul, 1ul, 2ul>(std::_Index_tuple<0ul, 1ul, 2ul>)
    (this=0x556346c493a8) at /usr/include/c++/11.1.0/bits/std_thread.h:253
#14 0x0000556345d36f6e in std::thread::_Invoker<std::tuple<void (*)(int, char**), int, char**> >::operator()() (this=0x556346c493a8)
    at /usr/include/c++/11.1.0/bits/std_thread.h:260
#15 0x0000556345d36f52 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)(int, char**), int, char**> > >::_M_run() (this=0x556346c493a0)
    at /usr/include/c++/11.1.0/bits/std_thread.h:211
#16 0x00007fbad13df3c4 in std::execute_native_thread_routine(void*) (__p=0x556346c493a0) at /build/gcc/src/gcc/libstdc++-v3/src/c++11/thread.cc:82
#17 0x00007fbad1193259 in start_thread () at /usr/lib/libpthread.so.0
#18 0x00007fbad10bc5e3 in clone () at /usr/lib/libc.so.6

I made an ASAN build of Gamescope and they were still able to reproduce so it's not memory corruption or bad memory, which was my initial hunch. I have no idea what Decky does to cause this yet, but I guess I will give it an install and see where stuff falls apart..

misyltoad commented 2 years ago

I toggled overlay, over 300 times automatically while running HL2 and didnt get any crash.

I just installed Decky and ran the same script, and it crashed in several seconds. It's definitely related. :P

misyltoad commented 2 years ago

Okay, I may have found something after some more investigating with asan + Decky installed: https://github.com/Plagman/gamescope/commit/4f62e5d18ba20156348684cbebf29fc8b19c8745

This may fix the issue people are seeing. I am seemingly not getting crashes on overlay with it since...

NVM, was just good luck, ugh. Was still a problem though! :p

failzers commented 2 years ago

That's not to say its the root cause, or just surfacing an existing problem or something. We probably shouldn't be crashing from a client either way.

True, true. Was just talking about some investigations the team has had where they've encountered it on an uninstalled system. Great to see that we're getting somewhere though.

TrainDoctor commented 2 years ago

Please keep myself and the rest of the decky-loader team on what we can do to help out. We're getting close to a full stable release and we'd love to address this issue before we go for the full release.

misyltoad commented 2 years ago

I think its just interfering with the timing of things making a bug that has the potential to happen but doesn't end up surfacing.

When the overlay opens, two 1x1 windows are created and then destroyed by steamwebhelper:

wlserver: [types/wlr_surface.c:742] New wlr_surface 0x55998b7b2b10 (res 0x55998b7b1f40)
wlserver: [types/wlr_surface.c:695] Destroyed wlr_surface 0x55998b7b2b10 (res 0x55998b7b1f40)
wlserver: [types/wlr_surface.c:742] New wlr_surface 0x55998b7b2b10 (res 0x55998b7b1580)
wlserver: [types/wlr_surface.c:695] Destroyed wlr_surface 0x55998b7b2b10 (res 0x55998b7b1580)

In the bad case it ends up looking like this:

wlserver: [types/wlr_surface.c:742] New wlr_surface 0x55998b7b2ba0 (res 0x55998b7b26e0)
wlserver: [types/wlr_surface.c:695] Destroyed wlr_surface 0x55998b7b2ba0 (res 0x55998b7b26e0)
wlserver: [types/wlr_surface.c:742] New wlr_surface 0x55998b7b2ba0 (res 0x55998b7af960)
gamescope: types/wlr_surface.c:612: wlr_surface_from_resource: Assertion `wl_resource_instance_of(resource, &wl_surface_interface, &surface_implementation)' failed.

I think what is happening here is the following:

But that doesn't make sense, because in wl_resource_destroy it seems like it frees the existing resource and inserts a NULL at the id, and we are doing this all in a lock so it can't be halfway through doing that or something either... https://github.com/wayland-project/wayland/blob/main/src/wayland-server.c#L754

So I am not too sure right now.

misyltoad commented 2 years ago

I also tested if we hadn't processed creation fully either, by flushing wayland stuff before set_wl_id was called and that wasn't it either :thinking:

AAGaming00 commented 2 years ago

It's definitely related to Decky, I was in a VC with @Sterophonick and they were going back and forth with it enabled/disabled several times, and it only reproduced with it enabled. It's definitely caused by that.

That's not to say its the root cause, or just surfacing an existing problem or something. We probably shouldn't be crashing from a client either way.

The backtrace is very strange, there's a wlr_surface with a bad vtable (doesn't match surface implementation), which causes a crash when setting up the wl_id.

#0  0x00007fbad0ffad22 in raise () at /usr/lib/libc.so.6
#1  0x00007fbad0fe4862 in abort () at /usr/lib/libc.so.6
#2  0x00007fbad0fe4747 in _nl_load_domain.cold () at /usr/lib/libc.so.6
#3  0x00007fbad0ff3616 in  () at /usr/lib/libc.so.6
#4  0x0000556345db9290 in wlr_surface_from_resource (resource=0x556346b39d60) at ../subprojects/wlroots/types/wlr_surface.c:612
#5  0x0000556345d3a36e in gamescope_xwayland_server_t::set_wl_id(wlserver_x11_surface_info*, unsigned int) (this=0x5563469e4c40, surf=0x7fbabe37dfa8, id=56)
    at ../src/wlserver.cpp:1210
#6  0x0000556345d18e27 in handle_wl_surface_id(xwayland_ctx_t*, win*, uint32_t) (ctx=0x7fbabc0772b0, w=0x7fbabe37de90, surfaceID=56)
    at ../src/steamcompmgr.cpp:3569
#7  0x0000556345d19302 in handle_client_message(xwayland_ctx_t*, XClientMessageEvent*) (ctx=0x7fbabc0772b0, ev=0x7fbab77fd8a0)
    at ../src/steamcompmgr.cpp:3697
#8  0x0000556345d1cee4 in dispatch_x11(xwayland_ctx_t*) (ctx=0x7fbabc0772b0) at ../src/steamcompmgr.cpp:4827
#9  0x0000556345d1f1e9 in steamcompmgr_main(int, char**) (argc=28, argv=0x7ffc77d947b8) at ../src/steamcompmgr.cpp:5373
#10 0x0000556345d36a1a in steamCompMgrThreadRun(int, char**) (argc=28, argv=0x7ffc77d947b8) at ../src/main.cpp:578
#11 0x0000556345d37187 in std::__invoke_impl<void, void (*)(int, char**), int, char**>(std::__invoke_other, void (*&&)(int, char**), int&&, char**&&) (__f=
    @0x556346c493b8: 0x556345d369e0 <steamCompMgrThreadRun(int, char**)>) at /usr/include/c++/11.1.0/bits/invoke.h:61
#12 0x0000556345d3709e in std::__invoke<void (*)(int, char**), int, char**>(void (*&&)(int, char**), int&&, char**&&) (__fn=
    @0x556346c493b8: 0x556345d369e0 <steamCompMgrThreadRun(int, char**)>) at /usr/include/c++/11.1.0/bits/invoke.h:96
#13 0x0000556345d36fd1 in std::thread::_Invoker<std::tuple<void (*)(int, char**), int, char**> >::_M_invoke<0ul, 1ul, 2ul>(std::_Index_tuple<0ul, 1ul, 2ul>)
    (this=0x556346c493a8) at /usr/include/c++/11.1.0/bits/std_thread.h:253
#14 0x0000556345d36f6e in std::thread::_Invoker<std::tuple<void (*)(int, char**), int, char**> >::operator()() (this=0x556346c493a8)
    at /usr/include/c++/11.1.0/bits/std_thread.h:260
#15 0x0000556345d36f52 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)(int, char**), int, char**> > >::_M_run() (this=0x556346c493a0)
    at /usr/include/c++/11.1.0/bits/std_thread.h:211
#16 0x00007fbad13df3c4 in std::execute_native_thread_routine(void*) (__p=0x556346c493a0) at /build/gcc/src/gcc/libstdc++-v3/src/c++11/thread.cc:82
#17 0x00007fbad1193259 in start_thread () at /usr/lib/libpthread.so.0
#18 0x00007fbad10bc5e3 in clone () at /usr/lib/libc.so.6

I made an ASAN build of Gamescope and they were still able to reproduce so it's not memory corruption or bad memory, which was my initial hunch. I have no idea what Decky does to cause this yet, but I guess I will give it an install and see where stuff falls apart..

This may be caused by Decky's QAM injection causing the SP window to destroy the QAM window and create a new one. I can try and remove the window re-creation from Decky (it is just a side effect of how we inject into it) but this is likely still an issue in Gamescope as I've had it occur while in-game without ever opening menus.

AAGaming00 commented 2 years ago

I have a stashed half-working version of this (the QAM tabs will disappear sometimes but the window is never re-created) that I can build for you if it would be helpful.

I can also provide a debug function to cause that window re-creation next time the quick access menu is opened.

AAGaming00 commented 2 years ago

Does #623 fix this issue or is it unrelated?

failzers commented 2 years ago

but this is likely still an issue in Gamescope as I've had it occur while in-game without ever opening menus.

Yeah, it seems the most reproducible whilst opening overlays, but it does also crash whilst in game with no menus being displayed.

misyltoad commented 2 years ago

Does #623 fix this issue or is it unrelated?

It is unrelated.

misyltoad commented 2 years ago

This protocol https://gitlab.freedesktop.org/wayland/wayland-protocols/-/merge_requests/163

This xwayland PR https://gitlab.freedesktop.org/xorg/xserver/-/merge_requests/976

and this Gamescope PR https://github.com/Plagman/gamescope/tree/new-surface-association

Should properly solve the problem.

dan3093 commented 2 years ago

@Joshua-Ashton Is your last comment a fix that I can deploy on my own steam deck? Do I just need to be patient and wait for the Decky Loader to get an update?

misyltoad commented 2 years ago

I would just wait, there are a lot of still moving parts.

infernn commented 2 years ago

i'm having system reboot sometimes after i close a game with a screen that say "verify installation" is problem related to decky? should i just disable cef or uninstall completely decky?

dan3093 commented 2 years ago

@infernn I completely uninstalled decky after my last post and I have not experienced a single crash since doing so.

infernn commented 2 years ago

@infernn I completely uninstalled decky after my last post and I have not experienced a single crash since doing so.

Can i Just disable cef in the option ti try It or i have to unistall decky completly?

misyltoad commented 2 years ago

You can just disable the CEF option.

Sterophonick commented 2 years ago

Decky Loader has just pushed a commit that fixed their QAM injection. Closing.

misyltoad commented 2 years ago

This hasn't fixed the root cause.