ValveSoftware / SteamVR-for-Linux

Issue tracker for the Linux port of SteamVR
932 stars 45 forks source link

[BUG] System Lockups #488

Open gameboycjp opened 2 years ago

gameboycjp commented 2 years ago

Describe the bug SteamVR locks my entire desktop environment up at random, sometimes on start.

To Reproduce Steps to reproduce the behavior:

  1. At random while I'm playing VR.
  2. My screens and headset will lock up for 5-10 seconds, then black out for 5-10 seconds.
  3. Screens will come back after, but will be locked up. Mouse works. I have to switch TTY to reboot.
  4. See error

Expected behavior SteamVR should not lock up PC.

System Information (please complete the following information):

Screenshots I will post relevant lines of logs instead. Tue Dec 21 2021 20:10:53.739751 - CVulkanVRRenderer::UpdateBufferHelper. Ring Buffer Wrapped! (vkQueueWaitIdle) Tue Dec 21 2021 20:14:02.428315 - Failed Watchdog timeout in thread Render in WaitEvent( CompositorCompute, 1 ) after 6.026912 seconds. Aborting.

Additional context On rare occasion, I get a frame of screen tear when I push my GPU. I'm really hoping the issue isn't the GPU, given the current circumstances.

Note: Commenters who are also experiencing this issue are encouraged to include the "System Information" section in their replies.

gameboycjp commented 2 years ago

After replacing my gpu it did not stop crashing, so I spent some time gathering logs. It looks like the gpu driver is crashing entirely. Should I report my errors elsewhere too? I only get this driver crash with steamvr. Kernel is now 5.16.5-arch1-1, mesa is now version 21.3.5, steamvr is now 1.21.8

Errors from dmesg are the closest I get to anything that seems useful

[ 6124.254896] hrtimer: interrupt took 13740 ns I think this is before the crash happened, but leaving it here just in case

[ 6428.415004] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] ERROR Waiting for fences timed out! [ 6428.418340] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] ERROR Waiting for fences timed out! [ 6433.335004] [drm:amdgpu_job_timedout [amdgpu]] ERROR ring gfx_0.0.0 timeout, signaled seq=3278639, emitted seq=3278641 [ 6433.335274] [drm:amdgpu_job_timedout [amdgpu]] ERROR Process information: process VRChat.exe pid 12695 thread dxvk-submit pid 12798 [ 6433.335516] amdgpu 0000:29:00.0: amdgpu: GPU reset begin! [ 6437.335535] amdgpu 0000:29:00.0: amdgpu: failed to suspend display audio [ 6437.847035] amdgpu 0000:29:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] ERROR ring kiq_2.1.0 test failed (-110) [ 6437.847229] [drm:gfx_v10_0_hw_fini [amdgpu]] ERROR KGQ disable failed [ 6438.098306] amdgpu 0000:29:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] ERROR ring kiq_2.1.0 test failed (-110) [ 6438.098503] [drm:gfx_v10_0_hw_fini [amdgpu]] ERROR KCQ disable failed [ 6438.352938] [drm:gfx_v10_0_hw_fini [amdgpu]] ERROR failed to halt cp gfx [ 6438.366780] [drm] free PSP TMR buffer [ 6438.411303] amdgpu 0000:29:00.0: amdgpu: MODE1 reset [ 6438.411308] amdgpu 0000:29:00.0: amdgpu: GPU mode1 reset [ 6438.411374] amdgpu 0000:29:00.0: amdgpu: GPU smu mode1 reset

After the reset finishes I get "[ 6439.544159] [drm:amdgpu_cs_ioctl [amdgpu]] ERROR Failed to initialize parser -125!" over and over until I reset my system.

EDIT: I should mention this happens with all the games I've tried playing, but they're all running under proton as far as I'm aware.

Supreeeme commented 2 years ago

Just wanted to add on and say I'm experiencing this too, but only with certain games. This GPU reset has occurred for me on No Man's Sky, Jet Island, and Monster Showdown after playing for a few (10-20) minutes, but I can play Beat Saber for hours.

kiosion commented 2 years ago

I'm experiencing something similar, not sure if it's the same issue. I experience random display lockups seemingly randomly on launch, but don't need to reboot, my display starts responding again after the ~5-10 seconds. FWIW I'm also using Xanmod kernel, but with an Nvidia driver, not mesa.

Mrr7782 commented 2 years ago

Same here. PC Specs:

It took me about one or two hours to actually get SteamVR to work with as few problems as possible, and tested it out on Beat Saber, which worked. That was yesterday. Today I wanted to try Half-Life: Alyx, and the game launched, was stuck on loading for about 40s, once it loaded, it started playing the Valve logo (which didn't even show up in my headset) and 5s later, my PC is frozen, the cursor's the only thing that worked. Had to go to tty2 to restart sddm, reset the compositor, and then restart because my DE started tearing like if VSync were disabled, even though xrandr said my monitor was running at 60Hz. After a few more attempts, I tried installing the linux_v1.14 beta, which did the same thing, but instantly when trying to launch SteamVR. I then went back to normal SteamVR only to find out that that now did the same thing - my whole PC would freeze just upon launching SteamVR.

Also just like OP, after looking at dmesg, I saw that amdgpu threw error 125 (failed to initialise parser), but since this driver reset happens on both AMD and NVIDIA GPUs as proven by the above comment, this shouldn't be the driver's fault. I thought it might also be my PSU because it's pretty weak for my PC, but I figured that would make the whole PC shut down, it wouldn't just restart the GPU drivers, right?

I really hope this gets solved, I don't wanna have to go back to Windows, even if it's just for VR. I'll leave my PC turned off for tonight and try again tomorrow. If I find anything interesting, I'll post it here.

kiosion commented 2 years ago

Definitely not a PSU issue by the sounds of it. One reason my system might restore itself after a period of time is because I'm not using a display manager? I did find that reinstalling or switching to a beta branch fixes this issue for me, only for the launch immediately afterwards with a permission prompt - same results as if I run SteamVR manually with sudo ((not a good idea though)). If I was better at linux troubleshooting I'd look into this further, it's been an annoying issue for quite some time now.

Mrr7782 commented 2 years ago

I tried installing steam-native-runtime as I had it installed the last time I was able to successfully launch SteamVR and tried the old 1.14. Same problem. This time though, I wrote stdout and stderr to a text file.

Here are some things I found in the log: ./vrwebhelper: error while loading shared libraries: libcef.so: cannot open shared object file: No such file or directory amdgpu: The CS has been cancelled because the context is lost.

The first one's weird as there's a library called libcef.so in the same directory as the vrwebhelper executable.

I then went into tty2 and saved the output of dmesg, in which I found this: [drm] Skip scheduling IBs! (x445) Right before error 125

I uploaded the logs to this git. Take a look, you might find something.

After testing again on stable SteamVR, instead of seeing random colours in the headset, I saw the grid thing that shows when something's loading, but just one frame rendered, and then the GPU crashed again. This time though, my GPU didn't manage to reset and I had to do a hard reset.

This time the libcef.so error didn't appear in the SteamVR log-

while writing this (I'm on my phone), my PC just reset. Before the kernel loaded, systemd printed 'hardware error CPU1" or something like that, didn't have enough time to read the whole thing, and had even less to write it.

I'll commit the stable log to my git as well ofc and probably edit this comment later if I find something.

Since so many people have gotten this to work OOTB without any problems and because my internet has been extremely slow for the past few days, I don't think it's worth it trying the beta release. I'll try this again tomorrow as I saw that the mesa package has an update. If that doesn't help, I'll try asking on Reddit.

Worst case scenario, I'll have to install Winbloat on my secondary HDD and see if it works there.

Edit0: I see SteamVR trying to load libraries from /opt/rocm-5.1.1/lib, but while that fhat folder has libraries in it, I don't actually have rocm installed. Since there actually are libraries in that folder, this shouldn't make any difference, but should I try installing rocm?

Mrr7782 commented 2 years ago

Okay, I am no longer gonna even attempt to run Linux SteamVR because from some research and help from friends, it turns out the amdgpu driver's just shit. Most people, if not all, that have the same problems as me (hardware error while booting after a reset) are on an RX 5xxx or RX 6xxx GPU. I'm sorry I couldn't really help anyone, but it looks like I'll unfortunately have to install Windows for VR gaming because of my hardware and the amdgpu driver.

Jibodeah commented 1 year ago

I've been having a similar (perhaps the same) issue where playing VRChat will cause system lockups. I don't know if it's only VRChat, the one time I played a different game in VR (Phasmophobia) it didn't occur during a 2 hour play session which is inconclusive. Here's some of the syslog from one occasion.

kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=1429119, emitted seq=1429121
kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process VRChat.exe pid 9075 thread dxvk-submit pid 9119
kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset begin!

Then a little later

kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset(2) failed

My system is basically unusable after until I long press the power button, haven't been able to get the graphics to work again.

System specs:

I've compiled a whole list of things that don't fix it, which I will omit here but...

I tried creating a separate user account on my computer, just in case it was something strange in about my user and... No crashes on this user account. I did a fresh install of steam, steamVR, VRChat (and nothing else) and, so far with probably nearly 10 hours of playtime on this other account. No crashes. I did try going back to my regular user account and it did crash again. Additionally when playing on my usual (crashy) user account I see green graphical artefacts in my headset during the loading screens of VRChat, but I don't see these on the test (non-crashy) user. Perhaps they correlate with the issue.

So far I've been unable to figure out what exactly it is in my regular user account causing it to crash. It doesn't seem to be SteamVR (I tried resetting it via this post). I tried wiping Steam's config (by moving ~/.local/share/steam/config) but that didn't resolve the graphical artefacts which I think correlates with the issue. Same with wiping vulcan config in .local/share/vulcan. It may be worth noting that my test user account does not have super user privileges, so SteamVR cannot perform that part of its setup that requires that.

Finding out that it works on an alternate user has turned this issue from a real bummer to a moderate annoyance at worst, but it still would be good to figure out what exactly it is so I can do away with the separate user account.

farmboy0 commented 1 year ago

Please check out the following bug reports: https://gitlab.freedesktop.org/drm/amd/-/issues/2113 https://gitlab.freedesktop.org/drm/amd/-/issues/2135

CorneliusCornbread commented 1 year ago

Having a similar issue on Fedora WARNING: CPU: 8 PID: 194 at drivers/gpu/drm/amd/amdgpu/../display/dc/dcn20/dcn20_hwseq.c:115 dcn20_setup_gsl_group_as_lock+0x81/0x220 [amdgpu]

WARNING: CPU: 11 PID: 4941 at drivers/gpu/drm/drm_vblank.c:728 drm_crtc_vblank_helper_get_vblank_timestamp_internal+0x28c/0x3a0

WARNING: CPU: 0 PID: 19 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:599 amdgpu_irq_put+0x46/0x70 [amdgpu]

are some of the errors I get reported back from Fedora's problem reporter. My lockups typically require a full system power down to recover, often times I can't even get into tty using the keyboard the system is so badly locked up.

NewtSoup commented 1 year ago

Describe the bug SteamVR locks my entire desktop environment up at random

To Reproduce Steps to reproduce the behavior:

At random while I'm playing VR. My VR view will freeze then my screens will go black and my GPU fan will max out. I have to hit the reset button to recover.

Expected behavior SteamVR should not lock up PC.

System Information:

Distribution: Ubuntu 22.04

SteamVR version: 1.26.5 Steam client version: Steam Version: 1688171965 Steam Client Build Date: Sat, Jul 1 1:19 AM UTC -08:00 Steam Web Build Date: Sat, Jul 1 12:51 AM UTC -08:00 Steam API Version: SteamClient020 GPU: AMD Radeon RX6700XT (navi22, LLVM 15.0.7, DRM 3.42, 5.15.0.76-generic )

Opted into Steam client beta?: Yes

Graphics driver version: Mesa 22.2.5-0ubuntu0.1~22.04.3

Gist for SteamVR System Information: No logs.

Additional context Although this bug is random in other games it is reliably repeatable in Subnautica VR mode. It happens every single time I approach the north Island where the Aurora is supposed to crash ( this is game breaking as visiting the island is required )

gusthe commented 1 year ago

This could be related to Async Reprojection. I have not had a lock up in over a week with Async Reprojection disabled in steamvr.

To disable Async Reprojection you have to edit the steamvr.vrsettings file in the steam config folder.

~/.steam/steam/config/steamvr.vrsettings

   "steamvr" : {
      "enableLinuxVulkanAsync" : false,
   },
NewtSoup commented 1 year ago

Replying to https://github.com/ValveSoftware/SteamVR-for-Linux/issues/488#issuecomment-1637183374

thank you, I will definitely give this a try. It's also reassuring to know that you have also had lockups because that indicates it's less likely to be my hardware.

Addendum: Sadly this did not work for me as I got terrible image stuttering which made me feel dizzy very quickly.

gusthe commented 1 year ago

Replying to #488 (comment)

thank you, I will definitely give this a try. It's also reassuring to know that you have also had lockups because that indicates it's less likely to be my hardware.

Addendum: Sadly this did not work for me as I got terrible image stuttering which made me feel dizzy very quickly.

Yeah disabling Async Reprojection could cause motion sickness. Is there anyway for you to check if it stops the lockups you're experiencing? Maybe lowering quality setting would help with the stutter. If Async Reprojection is the cause Valve maybe able to fix it.

I'm also using a RX6700XT so our issue is prob the same.

NewtSoup commented 1 year ago

Replying to https://github.com/ValveSoftware/SteamVR-for-Linux/issues/488#issuecomment-1646046129

I have tried again with Async Reprojection off and I believe I didn't explain what I see correctly. It's not so much stuttering as multiple images whenever I move in the game or turn my head. Its the same sort of thing as in the old days if you moved your mouse cursor too fast you'd see multiple pointer images. Turning all the game settings down to zero did not help. I see this effect even at the menu in FO4VR, Skyrim VR and in Ultrawings 2. Turning ASynnc off renders these games unplayable for me. I think for the moment I will have to resign myself to having to hit the reset button - sometimes I can play for hours, sometimes it will lock up four times in the space of 10 minutes. There is no pattern to it. Except as I say in Subnautica at the North Island.

Incidentally with Async on I see mutlicoloured sparkles in Steam VR library ( I don't use home ). These disappear with ASync off. They don't appear in game so I've never worried about them.

Addendum: I have tried different refresh rates with ASYNC off to no avail. I can only play for about 5 minutes before I'm too sick to continue. I don't think I can play long enough to test if ASYNC is the cause of my lockups.