ValveSoftware / SteamVR-for-Linux

Issue tracker for the Linux port of SteamVR
934 stars 45 forks source link

SteamVR breaks after kill -9 of OpenVR/OpenXR clients #479

Open ChristophHaag opened 3 years ago

ChristophHaag commented 3 years ago

Killing any openxr client with kill -9 may break SteamVR such that vrclient.so segfaults every time when trying to start an OpenXR application.

To Reproduce Steps to reproduce the behavior:

  1. Run XR_RUNTIME_JSON=~/.steam/steam/steamapps/common/SteamVR/steamxr_linux64.json hello_xr -G Vulkan2
  2. kill it with pkill -9 hello_xr. Do NOT press enter to quit hello_xr and get it into the hanging state described in #422. This failure here only happens when the client is killed while running.
  3. Go to 2. and try to start it again. Wait 1-2 seconds and it may crash.

It doesn't happen after every time a client is killed, but after killing it 1-2 times you should be unable to start it again with this crash:

[23:23:23.810][Info   ] XrEventDataSessionStateChanged: state XR_SESSION_STATE_READY->XR_SESSION_STATE_SYNCHRONIZED session=93825002742416 time=271091379539043
[New Thread 0x7fffea639640 (LWP 514592)]
Sun Nov 14 2021 23:23:23.813247 - IPC: recovering abandoned mutex 0x7ffff5ea25a4

Thread 4 "hello_xr" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffea639640 (LWP 514592)]
0x00007ffff699b318 in LfMutexUnlockRobust(LfMutex*) () from /home/haagch-collabora/.local/share/Steam/steamapps/common/SteamVR/bin/linux64/vrclient.so
(gdb) bt
#0  0x00007ffff699b318 in LfMutexUnlockRobust(LfMutex*) () from /home/haagch-collabora/.local/share/Steam/steamapps/common/SteamVR/bin/linux64/vrclient.so
#1  0x00007ffff68fe8fa in CSyncLockThread::Run() () from /home/haagch-collabora/.local/share/Steam/steamapps/common/SteamVR/bin/linux64/vrclient.so
#2  0x00007ffff6979079 in SteamThreadTools::CThread::ThreadProc(void*) () from /home/haagch-collabora/.local/share/Steam/steamapps/common/SteamVR/bin/linux64/vrclient.so
#3  0x00007ffff7e9d259 in start_thread () from /usr/lib/libpthread.so.0
#4  0x00007ffff78a85e3 in clone () from /usr/lib/libc.so.6

Restarting SteamVR fixes this issue and OpenXR applications can be started again.

LubosD commented 3 years ago

I returned to playing games after a few months and decided to do some VR. Then everything started crashing, including SteamVR Home and I have a ton of these in my dmesg:

[ 2362.660994] VKRenderThread[31883]: segfault at 7f8dcd39f918 ip 00007fa05019f318 sp 00007fa009abed90 error 6 in vrclient.so[7fa04ff68000+6ab000]

After a quick IDA session, I determined the crash to be inside LfMutexUnlockRobust(). Restarting SteamVR completely does resolve the issue until it starts happening again.

So this is definitely a very serious bug affecting all VR apps/games on Linux.

ChristophHaag commented 3 years ago

Good to know, I haven't tested this with OpenVR clients but I do remember seeing these crashes in LfMutexUnlockRobust from time to time with OpenVR apps. Never took the time to reproduce them though.

amalon commented 3 years ago

I also am experiencing this same behaviour. For me its flightgear which I'm adding VR support to, which is able to trigger it 100% of the time when SIGINT'ing it. No openxr app seems to work afterwards until steamvr is restarted. Very occasionally i can get it to run a second time by closing it cleanly until it reaches the xrDestroySession hang and steamvr home starts, then killing it

Kagukara commented 1 year ago

Getting the same problem running xr-video-player via SteamVR.

More on the issue here: https://codeberg.org/yoshino/xr-video-player/issues/3

yshui commented 1 year ago

I don't think you need to kill -9. I am seeing this after closing The Lab (450390) normally. ValveSoftware/openvr#1790

yshui commented 1 year ago

I was trying to make The Lab work under Proton, and because it start mini-games as new processes, this bug is a blocker. I think I am mostly there, this bug is the last thing stopping it from working.

Can this get some attention from Valve?

yshui commented 1 year ago

I did notice this line in log:

Tue Nov 14 2023 19:31:17.218369 [Info] - IPC: recovering abandoned mutex 0x7f303f1121a8

which has an address matching the argument to LfMutexUnlockRobust:

(gdb) p/x $r12
$3 = 0x7f303f1121a8

perhaps vrclient is trying to reuse an invalid mutex.

radomskist commented 6 months ago

This happens in OpenXR if xrDestroySession is not called when the application exits.I wouldn't consider it related to kill -9, because signal handlers can destroy the XrSession.

However, it is for sure strange that the entire SteamVR runtime will break across the system if 1 app doesn't destroy its XrSession.