ValveSoftware / SteamVR-for-Linux

Issue tracker for the Linux port of SteamVR
918 stars 45 forks source link

[BUG] Random Flashing with Async Reprojection Enabled #230

Open Goofybud16 opened 5 years ago

Goofybud16 commented 5 years ago

Describe the bug While Async reprojection is enabled, I notice random flashing in some titles (such as VRChat). It seems that random garbage frames are inter-spaced with normal frames for a second or so, and it is very disorienting. It does not effect the SteamVR dashboard. I could see the flashing frames happen behind the desktop view. It happens every few minutes, somewhat randomly.

To Reproduce Steps to reproduce the behavior:

  1. Play a game that drops in and out of reprojection territory (preferrably down to the 1/2 framerate range, so 45 on 90Hz mode, 60 on 120Hz mode)

Expected behavior There are no random garbage images flashed in-between normal behavior.

Screenshots Recorded the headset mirror during the flashing, it didn't show up there. Not sure of a different way to capture it

System Information (please complete the following information):

Note: Commenters who are also experiencing this issue are encouraged to include the "System Information" section in their replies.

Goofybud16 commented 5 years ago

Adding "enableLinuxVulkanAsync" : false appears to correct the issue, although Async reprojection is not enabled.

ryao commented 5 years ago

@Goofybud16 How do you add "enableLinuxVulkanAsync" : false?

Goofybud16 commented 5 years ago

Edit the steamvr section of ~/.steam/steam/config/steamvr.vrsettings

   "steamvr" : {
      "enableLinuxVulkanAsync" : false
   }
Goofybud16 commented 5 years ago

This is still an issue with 1.7.13.

Goofybud16 commented 4 years ago

Still an issue with 1.8.2.

VRChat still has corrupted frames (I've found opening the game makes it happen very consistently). I managed to capture a recording of it happening through-the-lens on the game startup. https://youtu.be/BiqM1ksudD0

Beat Saber has 'stuttery' frames, in that it seems to jump backwards a few frames randomly. I attempted to capture a GPU trace, but the debug commands webpage wouldn't load. These wonky frames don't seem to show up when recording the mirror view. Just for reference, with Async Reprojection disabled, Beat Saber runs completely 100% smooth.

lostgoat commented 4 years ago

@Goofybud16 can I trouble you to get a gpuvis capture while running vrchat?

Goofybud16 commented 4 years ago

I can't seem to get the debug commands page to load.

It seems to just immediately close the connection and the page never loads.

image

lostgoat commented 4 years ago

I need to update the instructions for perf data capture. The steamvr webserver recently added some extra security policies that prevent it from loading outside of vrwebhelper.

For a cli method you can run kill -10 $(pgrep -f gpu-trace) to trigger a capture.

For a gui method: -> Open SteamVR -> Open the vrmonitor hamburger menu -> Developer -> Debug Commands

lostgoat commented 4 years ago

Instructions updated here: https://github.com/ValveSoftware/SteamVR-for-Linux/blob/master/CapturingPerfData.md

Goofybud16 commented 4 years ago

Finally had the time to do the capture: https://drive.google.com/open?id=1r6JsB5ELoJ1gfQj6tB8utdvNsBUh1XBM

In this case, it seems to have rendered a solid blue image in the left eye (but not the right eye) upon startup. It didn't flicker, but I got an eyeful of the VRChat loading screen blue color.

lostgoat commented 4 years ago

Hey @Goofybud16

In that trace I'm seeing some big sdma usage by Xorg, which ends up delaying the compositor reprojection/presentation work past vsync every once in a while.

What does your vram utilization look like while playing vrchat?

As far as I can see you have a Radeon VII correct? And that should be 16GB of VRAM so it shouldn't really be full. I'm a not really sure yet why xorg is doing such large sdma jobs in this case.

Goofybud16 commented 4 years ago

As far as I can see you have a Radeon VII correct?

Correct. Radeon VII + R9 3900X, 32GB of system RAM. Running out of memory shouldn't be an issue (even though I somehow do sometimes).

I do run 3x 1080p monitors (all in 60Hz mode) alongside my Index.

What does your vram utilization look like while playing vrchat?

Where would be the best place to check that?

Also, I have the weird issues while just at the loading screen, before any worlds have even been loaded. Seems like VRam usage shouldn't be that high yet.

lostgoat commented 4 years ago

Where would be the best place to check that?

I like to use umr which is a cli app for amdgpu debugging.

You can run sudo umr top and you'll get some output like this: https://gist.github.com/lostgoat/e18b0261f5d908f5c2ee2d1ff6871158

Note that the vram section is hidden by default, you need to press 'v' to enable it.

I'm not sure if Debian has a package for umr, but building it is pretty simple: https://gitlab.freedesktop.org/tomstdenis/umr

And for arch users there is: https://aur.archlinux.org/packages/umr-git/

Goofybud16 commented 4 years ago

I managed to build umr.

While just sitting idle at my desktop (Firefox and a few other programs open) I get:

https://gist.github.com/Goofybud16/c731d0a76675dea9d80b04eff09522a8

Launching SteamVR, I get:

https://gist.github.com/Goofybud16/68c495f6fe75e264e136fb7039b1db1a

And then VRChat:

https://gist.github.com/Goofybud16/9b44ecc63dacdb444e83f3f8b4057833

Looks like I've got ~3GB used of 16, so not enough to run out of VRAM.

Thinking about it now, could OBS be causing the issues? I usually record my gameplay, and so I usually have OBS running (even if not recording, just forget to shut it down) and it has an Xcomposite Window Capture for VRChat's window.

Goofybud16 commented 4 years ago

Update:

With SteamVR 1.9.4, I still get corrupted visuals. It seems to only be during extreme lag (such as loading a world and game startup in VRChat).

The corrupted visuals are not nearly as bad as they were in previous tests though. The screen no longer flashes white with garbage data.

Since my last report, I've replaced my motherboard (now an MSI MPG X570 Gaming Edge), upgraded to the latest 1.0.0.4B AGESA for the board (was on an X370 SLI PLUS with the 1st available Ryzen 3000 BIOS).

I've also upgraded to Linux 5.4 [running a custom-built mainline. Default Debian 5.2.0-3 settings, everything else default, signing disabled].

Interestingly, it seems that Vivecraft (Which is OpenGL) runs perfectly. I played for an extended period of time, and with Kernel 5.4, it was doing extremely well. No real issues with jitter, stutter, or lag.

Beat Saber was mostly good, but had an occational frame where it felt like the frame pacing went 1,2,3,4,3,6,7.... It wasn't consistent, and wasn't always noticable in the middle of gameplay, but it was there.

VRChat had it much worse, sometimes getting multiple incorrect frames in a row. Specifically, I tested with the "Home of the Time" map. I didn't observe any corrupted visuals, just seemingly incorrectly re-projected frames. I suspect this is similar (or the same issue as) #269.

The stuttering in the grid with no games running appears to be fixed.

System report (with Async disabled): https://gist.github.com/Goofybud16/17fc3b571dc5f8b082d9ae736720f2a6

Goofybud16 commented 4 years ago

I tested again on 1.10.30.

Overall, getting Async enabled [including setting CAP_SYS_NICE] seems to have in some cases, doubled framerates in some worlds in VRChat.

I still get occasional glitchy/flashy frames, but they mostly seem to be upon loading into larger, unoptimized worlds. They're gone within a second or two.

As for "misprojected" frames, it seems that there's something strange going on.

image

In VRChat Worlds "Colors of Spring" and "Island 1336," this seems particularly noticable. It's fairly easy to reproduce by scrolling side-to-side on one of the columns very quickly in VRChat's "Worlds" menu while in one of the two worlds. Just scrolling with a joystick isn't enough, you've gotta grab and fling it really fast. Turn your head at the same time, and you'll see the frames "juttering" forwards and backwards.

One world I went from barely getting 20 FPS to getting 40, and a much smoother experience.

In worlds that run at full-speed [like "Box Avoid Game", which is highly optimized and runs at a near constant 120] I didn't experience any hitching.

Some worlds went from around 60 FPS to 120.

Goofybud16 commented 4 years ago

Tested again in 1.11.9

Beat Saber is mostly OK-- whenever a Steam notification popped up [those seem to have started working again in 1.11.9!] framerates would tank notably-- felt like down to maybe 30.

There was stuttering, and the occasional mis-reprojected frame-- it felt like the game's frame-pacing went A B C D E C F G on occasion. It wasn't as bad as it's been in the past, however.

The game maintains a solid very smooth feeling with Async totally disabled.

VRChat was about as bad as usual.

Loading into the game caused what felt like a "brighter" version of the image to flicker in and out-- this seems to happen whenever the game does a significant amount of loading [such as transitioning in or out of a loading screen]. It's somewhat disorienting and blinding.

It also seems to mis-project frames, but [likely] due to longer frametimes [game running at 40-60 FPS instead of Beat Saber's what feels like 120] it's much more noticeable. In my case, it was very notable in the Home of the Time world.

In the default VRChat home, it wasn't that bad [although that world already bounces off of 120 FPS]

Scrolling around in the menus in the game doesn't seem to trigger the issue like it did before-- although VRChat has since upgraded to Unity 2018, which may be causing the difference there as well.

Goofybud16 commented 4 years ago

Tested with 1.12.5.

System report

Up to Debian kernel 5.6.7, and manually updated to the latest AMDGPU firmware from upstream git.

I've also upgraded to a Gigabyte Aorus X570 Elite WiFi from my old, slightly problematic MSI X570 Gaming Edge WiFi. I've got my 3900X at all stock beyond virtualization on [and running some VMs in the background] and RAM at 3600MHz XMP profile.

I installed Feral's Game Mode, and used gamemoderun %command% to run SteamVR. Testing with this configuration and gamemoderun enabled, it seems that the misprojected frames have decreased from before slightly. When both VRChat and SteamVR were run with gamemoderun, some worlds had significantly reduced FPS [40-ish]

When I ran only SteamVR with gamemoderun and let the game run on it's own, those worlds seemed to run much better [reported 60-61 FPS ingame], and with far fewer mis-projected frames than previously. The only frames I came across that were "corrupted" like I'd previously observed were in the left eye and during the world transition-- you hit "Go" -> fades to black -> fades from black into world. It seems almost like someone took the frame, selected a chunk of it, and turned brightness way up. I believe [but could be mistaken] that there seemed to be almost tearing with the "bright frames"-- the top 3/4 of it was brightened, but then somewhere in the frame, it was normal again. Like a partially rendered frame was combined with an additive transform or something? It's only ever in the left eye, never the right.

Turning on OBS and starting a recording would increase the number of mis-projected frames slightly.

The frames that were misprojected weren't nearly as bad in VRChat as they once were.

I also tested Beat Saber, and while there were still mis-projected frames, they were not quite as bad as they have been in the past. If it at all matters, I picked up Beat Saber for the first time in weeks, maybe a month, and immediately beat 3 high scores with async on, while previously I'd strugged to achieve scores I had with it off.

I'd tried Half-Life Alyx native with reprojection on [before using gamemode] and was able to trigger a lot of misprojected frames around the time that the drone drops the ladder in the intro.

Wild speculation: My suspicion, since GameModeRun helped, and the fact that it seems like potentially correctly reprojected frames are ending up in the video stream later than they should, is some kind of scheduling issue-- seems to me that async reprojection may be somehow getting pre-empted, something else runs, and when it gets re-scheduled, it properly displays the frame [but now too late].

Is there any system configuration variables that may be useful for me to list? Scheduler, scheduler config, etc? I'm currently on what should be fairly vanilla Debian.

Would it be possible to add a "deadline" of sorts? If reprojection doesn't hit that deadline, it doesn't present the frame [as it would be too late to show it anyways] That doesn't solve the original scheduling issue, but it might make the symptoms a bit better [a frame or two of stutter instead of a frame or two injected in the wrong order, which IMO is far, far more disorienting.]

When reprojection works, it works very well. It makes things feel much smoother, and I think it may have reached a point where I can live with it as it is-- it bugs out a tad on joining a world in VRChat, but is fairly smooth sailing after that in most scenarios.

Goofybud16 commented 4 years ago

I played for a few hours with it enabled last night.

I identified the pattern for when the flashing occurs.

It seems to happen whenever the game stops rendering frames for too long-- when normally you'd get the "stretchy image" from SteamVR where it will bend the frame a little, you instead get flashy stretchy image.

This usually happened when a new avatar loaded in for the first time-- or right as a world was loading in for the first time. I believe the game hangs as it's either (re)compiling shaders, or when it's uploading some data to VRAM to render the frame [IE the shader it just compiled, textures, etc].

Whenever the game would hang for a significant amount of time [>some fraction of a second], async reprojection would seem to take over and cause bright flashing frames, and it did eventually start happening in both eyes. It's rather disorienting. It's a great measure for "the game quit rendering frames" but not so great for my eyes.

Goofybud16 commented 4 years ago

The mis-projected frames that I see look a lot like #68

I don't believe that includes the flashing I see when the game stops rendering for a short period [when the game icon gets displayed onscreen and you get the stretchy image].

I will say that I found some places that reproduce it very well, to the point where I had to turn async back off because it was so bad.

Goofybud16 commented 3 years ago

I've recently upgraded to a Radeon 5700XT.

I flipped async reprojection on for 1.14.16, and found that it doesn't work correctly. It still has issues with flashing frames, portions of the frame being too dark/too bright, and for some reason was forcing the game to reprojection [IE 60FPS at 120Hz] even when it was making the frametimes successfully:

VRChat's FPS counter, as well as the frame time graph confirmed that it should have been running at native 120FPS, but was instead holding 60 for some reason.

image

Goofybud16 commented 3 years ago

I've made a number of updates to my system software-wise [and a few tweaks hardware-wise] since my last post, so I figured I'd update here; along with a new trace of the issue happening in Half-Life: Alyx (should be the native version). This was taken just waving my head/the headset around at the main menu.

CPU: Ryzen 9 3900X Motherboard: Gigabyte X570 Aorus Elite WiFi, BIOS revision F33e (ComboPI 1.2.0.1) RAM: 64GB DDR4-3600MHz XMP RAM GPU: Red Devil 5700XT, "Silent" BIOS PSU: Corsair RMx 850w [PSU upgrade seems to have resolved a lot of VR-related crashes] Storage: 1TB 970 Evo Plus as / mdraid-6 as /home Mesa 21 from Debian Experimental, otherwise Debian Testing system

System Report GPUVis Trace

The best description I can give is just that the async reprojection is just a few frames too late, and reprojects the past into the future, which is moderately disorienting.

ZarathustraDK commented 3 years ago

The best description I can give is just that the async reprojection is just a few frames too late, and reprojects the past into the future, which is moderately disorienting.

Probably the most apt description of the problem I've heard so far. Experiencing the same problem on a similar setup. Switching to legacy reprojection feels less jarring than current asynchronous reprojection.

CPU: Ryzen 3900X Motherboard: Asus Prime x570 Pro RAM: 3733MHz XMP RAMP GPU: Asus 6800 XT TUF OC Storage: Corsair MP600 1TB Mesa 20.3.4-2 Proton 6.3-1 SteamVR 1.16

EDIT: Went and installed Pop_OS! 20.10 because I got a hold of a Ryzen 5900x, same problem there.

Goofybud16 commented 3 years ago

Since I switched to my 5700XT, I've been having issues with intermittent Machine Checks crashing my machine while gaming. As part of debugging this, I started using some tools to investigate GPU/VRAM usage.

I decided to poke around and investigate my VRAM usage (using umr --top and radeontop), and found that collapsing my 3 Firefox windows into 1 with Simple Tab Groups was able to reduce my VRAM usage a notable amount (prior idle usage is now idle usage w/ SteamVR running). Also closed all visible Steam windows, leaving only the tray icon. Usage with just SteamVR running is now +/- 3.5GB, leaving +/-4 GB until it hits the point where performance starts to fall.

After extended sessions in VRChat, I've noticed that my VRAM gets full (north of 7.5/8GB), at which point, it seems that the GTT starts to fill up. When VRAM is full, my frames absolutely tank. I'm assuming that VRAM fills up due to VRChat, for some reason, using a lot and then overflowing into system RAM (GTT). Slowly, over time, it will fill up all 16GB of VRAM + GTT.

I remembered earlier, that @ lostgoat had said

In that trace I'm seeing some big sdma usage by Xorg, which ends up delaying the compositor reprojection/presentation work past vsync every once in a while.

After doing the VRAM consumption tuning, I re-enabled async reprojection, and quit noticing as many issues with it. I'm on SteamVR 1.16.10 at this point. Could it be that the SDMA transfers are related to the GPU essentially "swapping" VRAM into system RAM? Maybe even swapping the async reprojection data/code itself, resulting in the delay?

That would explain why it seems alright at first in VRChat, and only seems to kick in after a little while (after it eats enough VRAM to hit the threshold where problems start.)

With the lower VRAM usage, I was able to complete a song or two in Audica + a few songs in Beat Saber with minimal to no stuttering (there were a few stutters in Beat Saber, but not too bad), and explore a few VRChat worlds without any significantly worse performance issues. A world I'd previously bounce 60-120 FPS in now held 120 FPS, and when I enabled a mirror, Async reprojection seemed to kick in and work properly without significant reprojected frames.

Other (changed) system info: Mainline 5.12.6 kernel (copied Debian 5.10 config + make olddefconfig) Debian Experimental's Mesa 21.1.0

Goofybud16 commented 3 years ago

I've noticed a new bug and had to re-disable Async reprojection.

It seems that after the first start of SteamVR (so the second start without a reboot) it will crash and leave a glitched image in the headset.

It leaves a hung vrcompositor process that is so stuck it can even stall shutdowns, and starts printing messages from the kernel.

Further attempts to start it without a reboot can cause the machine to lock-up completely.

This could be a kernel issue, though. If I have time, I may attempt to bisect to find out where it is, and if it does turn out to be a kernel issue, likely need to file a bug in the kernel? I wouldn't think the kernel should easily allow SteamVR to get that horribly stuck just by starting it twice.

DASPRiD commented 3 years ago

With async reprojection now available on NVIDIA cards, I can confirm that the issue exists there as well.

Goofybud16 commented 3 years ago

I tested again with the following configuration:

Apologies for the poor recording, I was trying to fit my phone up to the lens as best I could.

https://youtu.be/VdX0tlwLIIk

It's... A tad broken. With no app open, there's a ton of graphical issues. The whole screen going grey is because only one base station was on-- so the headset was losing tracking. Even that had pretty severe issues.

If I "enableLinuxVulkanAsync: false" then it goes away and works just fine (which how I've been playing).

I haven't tested in a game yet, but even just in the empty backing world, things seem to be... Not good.

dirkson commented 3 years ago

I also have this issue. It's pretty awful.

Goofybud16 commented 3 years ago

I tested again with a brand new RX 6700XT.

Mesa 21.2.0, SteamVR 1.19.7, Mainline kernel 5.13.13 (Linux version 5.13.13 (stephen@stephen-Desktop) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2))

The visual corruption in the "empty void" has gone away, but is still present (as well as mis-projected frames) in applications.

The "empty void" with the colorful hills and the circles seems to now have tracking lag-- which is not present when Async reprojection is switched off.

Goofybud16 commented 3 years ago

Something I've found that can alter the performance characteristics a bit:

I use Radeon-profile to set my GPU into "VR" power mode (locks the clock speeds) image

In SteamVR Settings I used the Throttling control settings to add an extra ~8.33ms for prediction for the application (VRChat) Screenshot_20210828_182545

With Async reprojection off, this has resulted in a massive frame increase for me-- "VR" mode on the GPU increases frame consistency, and the 8.33ms increases the framerates a notable amount-- taking worlds that previously could barely hit 60 FPS to nearly hitting 60 FPS with a full definition mirror on (which saps a lot of performance).

With Async reprojection enabled, that same world now hits 70-80 FPS (mirror off), however the mis-projected frames become painfully obvious (they happen nearly continuously).

Purely conjecture here: Is there a possibility that the throttling control algorithm/reprojection timing algorithms need adjusted better for Linux? Is there maybe some kind of system configuration or kernel patch related to that which would cause a relatively stock system to maybe not work as well as one would expect?

It seems like, despite my 5700XT being able to comfortably hold 120 FPS in some places in VRC on Windows, my 6700XT struggles to maintain 60FPS in those same places on Linux unless additional tuning is applied to compensate (such as the additional prediction ms, or the FSR-enabled SteamVR library which gave notable performance improvements in some cases).

Goofybud16 commented 2 years ago

An additional item I've found that improves smoothness and reduces (but not eliminates) the issues with Async Reprojection: A 1000Hz kernel. I was running at the Debian stock 250Hz; once I changed to

CONFIG_HZ_1000=y
CONFIG_HZ=1000

in my kernel .config (and re-built and installed the new kernel and rebooted), SteamVR became a lot smoother. Frame times improved (from bouncing at 60-120 FPS to holding a steady 120 FPS in some places).

I guess it makes sense... If I understand correctly: 250Hz (kernel) / 120Hz (Index) = 2.08; roughly 2 "time slices" per vsync; and it doesn't divide evenly which may factor into the juddering. Meanwhile, 1000/120 = 8.333 "slices" per vsync; allowing more precise timing for SteamVR; as well as allowing the game to potentially get more overall CPU time to render (IE 6 'slices' at ~1 ms per slice for a total of ~6ms vs 1 'slice' at ~4ms for a total of 4ms CPU time per-frame).

While async reprojection definitely still misbehaves in a number of ways, the misprojected frames seem to be significantly reduced.

When the game suddenly changes performance characteristics or drops a lot of frames (such as an avatar loading in in VRChat; or enabling a mirror) the corrupted frames will return. It feels almost like it's async reprojection trying to "re-calibrate"; and after a few seconds, it goes away and the game begins to run smoothly, with a very occasional mis-projected frame. The mis-projected frames also don't seem to be as bad when they do occur as they were before.

I'm testing with SteamVR 1.20.2 on kernel 5.13.19.

Termuellinator commented 2 years ago

Edit the steamvr section of ~/.steam/steam/config/steamvr.vrsettings

   "steamvr" : {
      "enableLinuxVulkanAsync" : false
   }

it seems this no longer works? At least in my case the file is overwritten/newly created on every steamVR startup - even when chmod 555 is applied? Edit: ok, might be something weird on my side - the settings won't open at all, so thats not a good sign i'd say -.- Edit2: missing settings was caused by https://github.com/ValveSoftware/SteamVR-for-Linux/issues/465 - still, steamvr.vrsettings is recreated on steamvr startup if i include "enableLinuxVulkanAsync" : false in the steamvr section

Goofybud16 commented 2 years ago

SteamVR seems to continuously re-write the steamvr.vrsettings file; however if you properly insert the "enableLinuxVulkanAsync" line; it will persist as SteamVR updates the file.

Are you adding a new "steamvr" section, or adding that line to the existing one? There should already be a "steamvr" section, and you should be adding that line to it, something like this:

   "steamvr" : {
      "analogGain" : <snip>,
      "enableHomeApp" : false,
      "enableLinuxVulkanAsync" : false,
      "installID" : "<snip>",
      "ipdOffset" : <snip>,
<snip>
    },

It should also be noted that you must close SteamVR prior to editing this file.

Termuellinator commented 2 years ago

Ah, i got the isse when looking at your snippet - every line but the last one has to be terminated with a comma. I simply added "enableLinuxVulkanAsync" : false after the last line in the steamvr section - if i add a comma to the previous line the change is kept :+1: