alvr-org / ALVR

Stream VR games from your PC to your headset via Wi-Fi
MIT License
5.51k stars 488 forks source link

GPU crashes sometimes before streaming started #920

Closed mittorn closed 1 year ago

mittorn commented 2 years ago

Description

On every 5-10 launch GPU hangs in vrcompositor.real process when connecting headset:

[  140.325221] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring comp_1.0.0 timeout, signaled seq=2, emitted seq=4
[  140.325295] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process vrcompositor.re pid 5468 thread vrcomposit:cs0 pid 5477

on 6600 XT, if steamvr home enabled, gfx ring hang detected before comp ring as comp has higher hang timeout, on 580 gfx ring does not hang, so desktop still usable if gpu recovery disabled

It possibly related to vulkan layer, maybe some sync issues. hang frequency is depended on bitrate Release build from source crashes almost every time, ubuntu nightly build not so often, but maybe it just luck

General Troubleshooting

Environement

Hardware

Note: for Linux, an upload to the hw-probe database is preferred: hw-probe -all -upload https://linux-hardware.org/?probe=73773b7de6 CPU: AMD Ryzen 1600X GPU: RX580, RX6600 XT Audio: jack2 daemon routing alvr playback to capture inputs and playback outputs to alvr capture

Installation

ALVR Version: 16 release, 17 nightly, current git, git revision from current nightly SteamVR Version: 1.14 Install Type:

zip and source, release build from source has higher possibility to crash OS Name and Version (winver on Windows or grep PRETTY_NAME /etc/os-release on most Linux distributions): Gentoo Linux, mesa git, Linux 5.14, SteamVR on HDD, so reading of some files may be delayed

mittorn commented 2 years ago

release steamvr causes crash too

Kirottu commented 2 years ago

Try disabling SteamVR Home, that caused this issue for me.

mittorn commented 2 years ago

It hangs with disabled SteamVR home, with enabled home i got hang in gfx ring too. Issue related to LinuxVulkanAsync and realtime priority got via "setcap" After disabling setcap script, removing "eip" capability in vrcompositor-launcher and setting linuxVulkanAsync to false and updating SteamVR it seems to work correctly (home does not hang it too) Only disabling linuxVulkanAsync results in some internal vrcompositor crash (but i not tried with recent SteamVR beta) Only disabling setcap results in crash in SDL_SetThreadPriority (but seems to be fixed in last SteamVR update) After patching vrcompositor to not call SDL_SetThreadPriority it works correctly even with SteamVR 1.14

betaRadiation commented 2 years ago

Could you explain to me how to actually do any of this? Because Steamvr is currently crashing my gpu.

gardotd426 commented 2 years ago

@mittorn it would help if you could explain to others how you are patching a binary file, since SteamVR isn't open-source and vrcompositor is an ELF executable binary file. Saying "patching X fixes the issue" without any extra information doesn't help others who may not know what you're talking about.

mittorn commented 2 years ago

I replaced string in symbol table by harmles math function in hex editor

mittorn commented 2 years ago

Recent SteamVR do not need this, it has fixed SDL2 imolementation

depau commented 2 years ago

You seem to have a similar setup to what I have, the GPU used to crash but it no longer does for me, consider updating kernel+mesa: I'm on Linux 5.17.9 + Mesa 22.1.0.

It still doesn't work but at least it doesn't reset the GPU.

thorsten-passfeld commented 1 year ago

Using ALVR version 19, this issue happens with both my 1080 Ti and RX 6800 XT. My GPU crashes and forces me to manually turn off my PC just because I took off my Quest 2 and put it back on again, triggering a new connection due to the proximity sensor re-enabling the screen(s). This sucks because it means that I can never safely take off my headset without fearing that my whole PC locks up.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

thorsten-passfeld commented 1 year ago

Did #1558 fix this issue? @Vixea Sounds vaguely related from the commit message.

Vixea commented 1 year ago

No, this is for closing ALVR I'm not sure if this issue still applies as there have been a few changes related areas of ALVR. @mittorn can you please send an status update to see if it still happens in the latest nightly else I'll assume it's fixed.

mittorn commented 1 year ago

This solved for me by disabling setcap in vrstartup.sh script, but it still may happen when realtime capabilities enabled. Latest version i tested without disabling setcap is some early 0.20 nightly, so i'm unsure if it fixed now. It may crash when vrcompositor runs in realtime mode and starts rendering image. Also vrcompositor in realtime mode wrongly renders overlays and grid. I'll retest it today

Vixea commented 1 year ago

Ah yea this issue should be resolved in that case you can always reopen it if that is not the case.

mittorn commented 1 year ago

Now i have different gpu. Checked again: after applying setcap CAP_SYS_NICE=eip linux64/vrcompositor-launcher vrcompositor frame time becomes 5ms, without setcap it's 0.9. I do not see broken tracking now like was before, but it's probably because of faster GPU

mittorn commented 1 year ago

When setcap enabled and async mode disabled, tracking seems to be broken too. When async mode enabled, i do not see frame errors on performance graph, but if framerate drops, client does not apply reprojection correclty. I did not get any gpu crash this time