ValveSoftware / SteamVR-for-Linux

Issue tracker for the Linux port of SteamVR
917 stars 45 forks source link

Other Applications are getting Corrupted Visuals #183

Closed Goofybud16 closed 4 years ago

Goofybud16 commented 5 years ago

Your system information

Please describe your issue in as much detail as possible:

When playing games in SteamVR for an extended period of time with use of the SteamVR Overlay, I notice graphical corruption in other applications. For example, Solaar starts to look like

image

instead of the usual

image

The media player widget on my desktop has also had some corruption, now appearing as

image

This effects all parts of Solaar, including the tray icon, tray menu, and main application menu. Closing and reopening the window does not fix the problem, only fully restarting the effected application. On 1.2.10, I occationally get small graphical corruption in Steam, while 1.3.7 appears to effect every application and causes corrupted fonts and images. I've even had it cause my desktop background to appear corrupted at one point.

EDIT: It seems to be related to using the SteamVR overlay. I didn't use it on 1.2.10 due to stability issues (caused games to hang when closing sometimes), but started using it on Beta 1.3.7 since it no longer caused games to hang.

Steps for reproducing this issue:

  1. Open some other applications on the system
  2. Open SteamVR
  3. Open a game (I tested with VRChat)
  4. Play the game for a few hours while heavily using the Steam Overlay for Steam, Steam Chat, and Desktop mode.
Goofybud16 commented 5 years ago

After some further playtime, I think it might be related to the Steam Overlay.

On 1.2.10, I almost never used the overlay because it would crash the game.

On 1.3.7, it usually doesn't (but occasionally does cause very poor performance) so I do use the overlay.

During the times when I get corrupted visuals, I have usually used the overlay (and desktop mode) many times while ingame.

Not completely sure that is what causes it, but that seems like it might be the cause as when I don't use the overlay at all, I don't seem to get corrupted graphics in games.

Goofybud16 commented 5 years ago

After further testing, it does seem to be related to the overlay.

I have started having an intermittent issue where after so many uses, using the overlay causes bad game performance (IE halving the framerate). I've stopped using the overlay for the most part, and since then, the graphical corruption has stopped.

Goofybud16 commented 5 years ago

I noticed this happen again with 1.3.21, but not yet with 1.3.22. It doesn't seem to be very consistent, as I can go weeks without having an issue until it suddenly corrupts a ton of stuff.

Goofybud16 commented 5 years ago

I have had it with 1.3.22 now. It is still corrupting visuals, both in and outside of the game.

image

lostgoat commented 5 years ago

Can you let us know whether the corruption is seen just on the steam client, or are other applications affected as well.

Goofybud16 commented 5 years ago

So far with 1.3.22 I have only noticed it within the game (avatars, textures, menus, etc get corrupted in VRChat), SteamVR (The empty "loading room" for when a game is lagging/loading gets corrupted, basestation/controller models get corrupted), and the Steam client.

On previous versions of SteamVR, I was seeing it in other applications as well. It would corrupt my desktop background, fonts in other applications, etc.

The corruption happens irregularly and I've had sometimes several days to a week where it didn't cause any issues, then suddenly corruption appears. The effect doesn't always seem to be consistent either; sometimes just the game and SteamVR gets corrupted, sometimes other applications do as well. I'll try to screenshot and report back if I notice anything outside of Steam/SteamVR/the current VR game getting corrupted on 1.3.22.

Goofybud16 commented 5 years ago

Just noticed some corruption on 1.4.2 outside of VR:

image

Goofybud16 commented 5 years ago

Today, at some point, the SteamVR overlay crashed/broke. When pressing the system button, nothing happened. When I exited the game, the headset went black. During the game, however, everything appeared fine.

I still noticed graphical corruption happening. This makes me think it is either a Proton bug, VRChat bug, or SteamVR itself, not the overlay (since I am 100% sure I didn't even open it today).

Goofybud16 commented 5 years ago

Just had it happen again with v1.4.4. After a very long play session yesterday, my desktop looked like this:

steamvr graphical corruption

(I took the screenshot this morning).

Restarting Plasma 5 fixed the graphical corruption.

lostgoat commented 5 years ago

@Goofybud16 next time it happens can I trouble you to check dmesg for messages related to gpu hang/recovery.

Recent versions of the amdgpu driver have enabled gpu recovery and what used to be a hang now results in a GPU reset. But that means the vram contents might be lost and if the application isn't aware that this happened it can result in this issue.

This would also mean that the root cause of the problem is in SteamVR causing a GPU hang.

Goofybud16 commented 5 years ago

GPU Hang

I manually enabled the amdgpu.gpu_recovery with a boot parameter on my kernel. I was having problems with SteamVR causing hangs in previous versions, and that enabled the resets to work. Whenever a reset occurred, I usually notice because the recovery code doesn't entirely work in my kernel. It usually kills my VR game completely, headset goes black, and my desktop comes back although with massively corrupted and unusable contents. It still saves the system enough that I can get to a TTY and issue a proper safe shutdown, however.

dmesg

I'll check that now, since I haven't rebooted since it got corrupted.

$ sudo dmesg | grep amdgpu
[normal boot snip]
[    8.440906] [drm] Initialized amdgpu 3.27.0 20150101 for 0000:23:00.0 on minor 0
[171851.449455] amdgpu 0000:23:00.0: GPU fault detected: 146 0x02821014 for process plasmashell pid 1570 thread plasmashel:cs0 pid 1727
[171851.449459] amdgpu 0000:23:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00103850
[171851.449462] amdgpu 0000:23:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0F010014
[171851.449465] amdgpu 0000:23:00.0: VM fault (0x14, vmid 7, pasid 32773) at page 1062992, write from 'CB7' (0x43423700) (16)

As for the actual behavior of the corruption in-game, it seems to have changed recently. It used to be just completely missing models and corrupted textures. Now I've noticed a change more recently: I'll notice one or two corrupted textures here or there, and then within 30 minutes or so (maybe less? Hard to keep track of time in VR) avatars tend to start having polygons go everywhere.

Here is an example of a corrupted texture in-game: image

And here is the same avatar, but not a part with corruption:

image

The corrupted textures also seemed to change based on distance and rotation of the camera.

Here's an example of polygons going everywhere:

image

I've also noticed that sometimes the graphical bugs don't show up on the ingame camera. They show up in the "viewfinder," but not the actual png saved to disk. Sometimes, they do, however, as that is how I was able to grab the earlier screenshots.

The corruption seems to come and go; I can sometimes play for multiple 10+ hour sessions with no problems before there is suddenly corruption. If I restart the game when I first notice it, it usually goes away without a fight. If I wait for it to get really bad (as seen in the "polygons going everywhere" screenshot), it starts to effect other applications.

I'm not sure if it is relevant or not, but I'm trying to include anything that might be useful in figuring this out: I also sometimes see weird graphical corruption in SteamVR home, even before starting another game. Everything is fine at first, but the longer I'm in it, the more I see random bugs where polygons seem to just disappear for a few frames. Small holes appear in walls very briefly (maybe just a few frames at most) and then everything is fine. It doesn't seem to effect anything else, so I've mostly been ignoring it as I don't really use SteamVR home very much compared to other games.

lostgoat commented 5 years ago

Thanks for the extra info.

The corruption seems to come and go; I can sometimes play for multiple 10+ hour sessions with no problems before there is suddenly corruption.

The effects of lost vram can be very random. During reset any part of the vram could have changed, so the app may start reading bad texture data, or even execute wrong code due to bad instruction buffers.

Sometimes it can self heal as assets reload naturally. Sometimes it can get worse if a bad pointer makes the app slowly start writing to bad places in memory.

Squashing the hangs is going to be necessary here :)

Goofybud16 commented 5 years ago

Just had a GPU hang. Managed to remote in and grab a dmesg log before rebooting.

https://gist.github.com/Goofybud16/8fff7cbe082ba2757554b745d4aa459c

This is with SteamVR 1.5.1 beta.

Goofybud16 commented 5 years ago

This is still happening with 1.5.14. SteamVR gets corrupted visuals in the "fallback" world, the game gets corrupted visuals, and other applications get corrupted visuals.

I'm not 100% sure, at this point, if it is hardware failure or not. It almost seems like VRAM is getting randomly corrupted, however I'm fairly certain that I'm not getting any GPU resets.

lostgoat commented 5 years ago

@Goofybud16 the problem isn't your hardware, I can repro the issue on my card as well.

A fix will be on the way.

JulianGro commented 5 years ago

I have a similar issue on nvidia. Though the steam client is the only thing that is affected. And it only happens right after closing SteamVR. It seems to take a random avatar picture from my friends list and show it all over the client. Bildschirmfoto vom 2019-06-14 00-26-11 I guess, once a fix is out, I will report if it fixes this issue as well.

kisak-valve commented 5 years ago

Hello @jug007, per "Fixed 'psychedelic' colours in the Steam client caused by exiting SteamVR." in the SteamVR 1.6.1 beta update, please opt into SteamVR's beta and retest your issue.

Goofybud16 commented 5 years ago

After upgrading to a Radeon VII and Debian's Linux-5.0-Trunk kernel from Experimental, I stopped noticing this issue under 1.5.17. I have yet to have it be reproduced in 1.6.1 either. Still, I've had periods of a week or two before where this issue didn't occur previously.

I have noticed, however, a possibly related issue where there seems to be a single frame of garbage shown when closing the Steam Overlay. With the VII, I'm basically running reprojection 100% of the time (due to CPU bottleneck possibly caused by an old Mesa + slightly old kernel) . It seems like the first frame to be displayed once back in the game is just garbage from VRAM or something, then it is back to being fine again.

JulianGro commented 5 years ago

SteamVR 1.6.1 still has the issue on my end with nvidia. Since the Steam client shows other odd behavior, specifically in the friends list, this might not necessarily be a problem with SteamVR. I guess I should make an issue for the odd behavior of the Steam client, before digging deeper into this specific issue after quiting SteamVR.

h1z1 commented 5 years ago

I've had this happen but only with multi GPUs (not in SLI). Seemed to happen more when the clocks were not locked to exactly the same speed. Running them in VMs worked around it (or otherwise prventing them all from from appearing to one instance of the nvidia driver .. binding with vfio for example).

Goofybud16 commented 5 years ago

I just had the issue again under 1.6.7. Steam became corrupted.

Interestingly, I just upgraded my system to an R9 3900X.

With my R7 1700 + R9 Fury, I had the problem really badly.

With the R7 1700 + Radeon VII, I didn't see the problem.

With the R9 3900 + Radeon VII, I have started to see it, although it doesn't seem to be that bad yet.

With the 1700 + VII, I usually got around 30-40 FPS most of the time, as I was recording and CPU limited. With the 3900X, I now get a fairly solid 45 with dips below. I've also noticed that the graphical corruption sometimes seems to appear after some stuttering; eg game drops some frames, then recovers and stuff is corrupted. Could the corruption possibly be caused by the transition in/out of reprojection?

SlickMcRunFast commented 5 years ago

Have this issues while exiting steam VR where it takes a friends profile picture and uses it as the background for everything. NVIDIA 1060 and 2070 super (swapped) on 430.34 Pop 19.04.

lostgoat commented 5 years ago

Hey Everyone,

When corrupted visuals occur, it is usually due to SteamVR causing a GPU hang. Then the driver will perform a GPU reset and other apps will have bad values in memory (Linux GPU drivers don't support proper app recovery after a GPU reset yet).

Hence, you might have corrupted visuals due to different problems. To help sort things out, when commenting on this issue please include the information from the issue template: https://github.com/ValveSoftware/SteamVR-for-Linux/blob/master/issue_template.md

Having this information will help us properly identify the issues and address them.

JulianGro commented 5 years ago

Your system information

Please describe your issue in as much detail as possible:

When exiting SteamVR, sometimes a random avatar from my friendslist gets pasted all over the Steam client. No other applications seem affected. This has always been happening to me (maybe a year now), no change with different driver versions. https://user-images.githubusercontent.com/11144627/60335878-2d472d80-999f-11e9-918b-8e1516f17b19.png

Steps for reproducing this issue:

  1. Start Steam and SteamVR
  2. Exit SteamVR The issue only appears sometimes. (Maybe 1 out of 20 times or so)
Goofybud16 commented 4 years ago

I don't think I've really had issues with this recently that I can recall.

Going to a Radeon VII from a Radeon R9 Fury was a huge reduction in this issue. After that, it only occurred very occasionally.

Now it seems to almost never happen.

Current Mesa: driverInfo = Mesa 19.2.4 (LLVM 9.0.0) Kernel: 5.2.0-3-amd64 #1 SMP Debian 5.2.17-1 (2019-09-26) x86_64 GNU/Linux

Even when I do get what I suspect is a GPU reset (headset does something funky, screens blank out for a second, notification from KDE when it comes back), everything seems to keep running smoothly (even the VR application). Even then, I can only think of maybe once that it's happened in.... a month?

I don't recall the last time I've seen corrupted visuals at this point, it was likely some time back around July or August. Even with uptime regularly hitting 20-25 days with daily multi-hour VR sessions, everything seems to be rather stable.

kedodrill commented 4 years ago

@Goofybud16 Are you still having this issue?

Goofybud16 commented 4 years ago

I don't believe I've seen any issues since my last post. Since nobody else has responded since, it may be time to close the issue.

kisak-valve commented 4 years ago

Closing per the last comment.