Closed matte-schwartz closed 1 month ago
well, using mangohud to monitor UI performance ended up throwing a slight wrench in the report... the UI seems to lag the most for me only while presenting mangohud. if mangohud is not on screen the UI performance is much closer to expectations. force compositing still fixes the performance when showing mangohud though
was able to narrow down the effects of this revert further to VRR being enabled on panels that support it. if VRR is on, the revert brings the steam ui performance up from a shakey ~48fps to a stable ~60fps at 3440x1440@60hz on a 7900xtx within gamescope-session
with the new adaptive_sync
convar in gamescope, this issue is now much easier to repro on OLED Deck. Set Mangohud overlay to full while the screen hasn't entered a \[drm:amdgpu_dm_crtc_vblank_control_worker \[amdgpu\]\] Allow idle optimizations (MALL): 0
state and you will see the UI fps tank with gamescopectl adaptive_sync 1
and then shoot back up to 90fps with gamescopectl adaptive_sync 0
@matte-schwartz Is this all tested on embedded gamescope (aka "gaming mode" on steam deck)? Especially w/ embedded mode, you could try profiling w/ gpuvis: https://github.com/ValveSoftware/gamescope/wiki/Tracing
I would advise you to try the above gpuvis profiler first, but there’s also another alternative profiling integration I’ve been working on: Tracy profiler: https://github.com/ValveSoftware/gamescope/pull/1328 One neat thing about Tracy profiler is that the profiling stats are shown live while gamescope is running. Caveat with my Tracy profiler integration:
gamescope-xwm
covers some post-render/post-present work, so can be a bit longer than the actual frame timeNote that when building gamescope w/ my Tracy integration, you have to add -DTRACY_ENABLE -Dtracy_enable=true
to meson setup/configure cmd
Also you have to install the Tracy server to view the profiling data captured from gamescope
How to build Tracy server:
First make sure you have capstone
installed
git clone https://github.com/wolfpld/tracy.git
cd tracy/profiler
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
MAKEFLAGS="-j$(nproc)" make
Now that I think about it, I should probably add some code to use the Tracy custom data plotting functionality w/ the frametime info that’s passed to mangoapp…
this is all tested from SteamOS Main in the shipped gamescope-session, with gamescope compiled from latest git using the SteamOS PKGBUILD paired with the steamos devkit client. I will give Tracy profiling a test in a bit, but the devkit makes gpuvis stupidly easy to use.
from boot (with adaptive sync disabled): (deck@steamdeck ~)$ gamescopectl adaptive_sync 1 (deck@steamdeck ~)$ gamescopectl adaptive_sync 0 (deck@steamdeck ~)$ gamescopectl adaptive_sync 1
the gpu-trace.zip: https://drive.filen.io/d/69d86e9d-f12e-4b84-9d19-3d7be92f8f8f#flpeGVfwN6bFhRsi9B70uOWI6pFIpLaY (way too big for GitHub lol)
Hopefully this offers a bit more insight into the issue. Let me know if you need me to recapture anything.
Thanks for the new gamescopectl commits @sharkautarch running tracy with sudo in drm mode has its own set of issues (running steam as root is a big no) but even without sudo hardware vblank logs the tracy profiler has showed some interesting stuff.
Here we have gamescopectl adaptive_sync 0
, so adaptive sync disabled.
Here we have gamescopectl adaptive_sync 1
It seems like these repeated longer paint_all() -> Present()
calls may be the reason I'm seeing decreased performance with adaptive sync on? The longer initial paint_all() -> Present( &frameInfo,async )
seems normal when changing between adaptive sync options
I also exported my tracy session since I'll be honest, most of the code in gamescope is still a big ??? for me :frog: adaptive-sync-testing.tracy.tar.gz
Edit: Well it turns out I get much clearer results if I also force the Steam UI to continuously render video frames with the debugging parameter for it: adaptive-sync-constant-render-with-mangohud.tracy.tar.gz
It seems like adaptive sync also gives me this lockable which does not seem to show up when gamescope is not using adaptive sync. This could be as designed so maybe it's a nothingburger but that's the other major difference between the two states (aside from the very erratic vblanking of course)
Just as a side note - it seems tracy profiling triggers some instability in gamescope-session if monitoring for too long.
edit: on the left is adaptive_sync 0 and the right is adaptive_sync 1, with continuously rendered frames in the Steam UI on the Home page of Game Mode
took some time to dive deeper into gpuvis today after packaging it for Fedora, and the results are pretty similar to what I was seeing with Tracy (although I've still only been able to run that as non-root so gamescope/steam don't complain) but that's a good indication of its accuracy @sharkautarch
@matte-schwartz a day or so ago, I had made a new commit to my tracy PR to attempt to fix the instability you saw Are you still having any issues w/ crashing w/ the tracy PR?
It seems like adaptive sync also gives me this lockable which does not seem to show up when gamescope is not using adaptive sync.
Unless you changed the setting yourself, Tracy profiler only displays lock-events wherein at least one thread is blocked waiting to take a lock (meaning a different thread has already acquired the lock).
So it could be an issue in a different scenario, but from looking at your screenshot, there's no lock contention during the long 24ms gamescope-xwm frame so it is extremely unlikely that it is directly causing lag. I'm pretty sure from the extremely low frequency of lock contention, and the duration looking somehow short, that it is unlikely that this could be indirectly causing lag either.
However, it's possible that the increased lock contention is a symptom of the underlying cause of increased lag when using mangohud + adaptive sync is on.
@matte-schwartz a day or so ago, I had made a new commit to my tracy PR to attempt to fix the instability you saw Are you still having any issues w/ crashing w/ the tracy PR?
No more crashing, was using it merged on top of gamescope master for a bit yesterday
@sharkautarch On your edit, thanks for the additional context and breakdown. The intricacies of what you're talking about are a bit beyond me, so I think I've taken this as far as I can
I'll wait and see what @Joshua-Ashton makes of this when they get the chance to check out the issue
@matte-schwartz I took a look at the tracy file you sent me From looking more closely, it seems like vblankmanager's wait times are sometimes longer than they should be. Where it will briefly spike up to ~19-20 ms.
W/ the two screenshots above, the vertical bars at the top show the 'frame-timing' for vblankmanager. (For those that are unaware, vblankmanager gives gamescope a schedule for when and how long it should wait in-between frames. So w/ the tracy integration, each profiled 'frame' for vblankmanager measures the time duration starting w/ the first time vblankmanager was sent an update (not including updates from the previous vblank) and then ending when gamescope wakes back up from idling )
So the yellow bars are vblank times of ~19-20ms, and the green bars are around ~11ms
The weird thing I'm seeing is that gamescope is sometimes trying to present in the middle of a vblank 'frame'. as can be seen in the second screenshot. It's possible that this is due to a 'stale' vblank being sent out, which should be picked up by gpuvis (I should probably update my tracy profiling code to pick that up as well...)
That being said, I also noticed that at one point, a vblank 'frame' somehow fully overlapped one present, and then also partially overlapped a tiny bit of the present before that:
When I last tested my tracy profiler integration w/ gamescope, (which was only in nested mode on my x11 desktop), I definitely saw some more innocuous looking overlap between vblank and present, but there was only ever a tiny bit of overlap...
Also, see the frame-timing overview for gamescope-xwm
, where w/ the blue vertical bars, it's only ever taking at most 2.68ms (which are the slightly more elevated blue bars) for each recorded gamescope-xwm
frame.
Interestingly enough, it seems like the slightly-longer 2.68ms frame-times are always the frames before the long 19-20ms vblankmanager frames:
And now that I look back at screenshot 2 and the vblankmanager frames, I've realized that all or most of the 19-20ms vblankmanager frames overlap a ~10ms present, and that ~10ms present somehow isn't inside of gamescope-xwm
frame.
The only reason for why a present wouldn't be inside of a gamescope-xwm
frame is if steamcompmgr didn't receive a vblank. This is because I had arbitrarily decided to code my tracy integration that way.
This could mean one of two things:
Looking at this fifth screenshot, where we have a 2.68ms normal gamescope-xwm
frame, followed by one vblankmanager
frame w/ a ~10ms repaint, followed by a ~5microsecond gamescope-xwm
frame that doesn't do a present for some reason, followed by a ~11ms vblankmanager
frame containing a 2.17ms repaint:
I'm now leaning towards the latter of the two things...
Something else I've noticed is that there is a significant amount of "hitching" between direct scan-out and gamescope compositing when VRR is enabled that is not present with VRR disabled. There's a solid 1-2 seconds of laggy response time I'd say. It's pretty noticeable if you're using the drm
backend on a high refresh-rate monitor and invoke or move the cursor with VRR disabled, and then do the same with VRR enabled.
@matte-schwartz
Oh I just recently noticed that there were some locks used within the drm
backend that I somehow entirely missed, and so my tracy profiler hasn't been instrumenting them...
I will update it soon to add instrumentation for those locks
@matte-schwartz I have just added the missing instrumentation for the locks used in the drm backend. I also instrumented the locks used in the wayland backend as well, while I was at it
Cool, grabbed a new .tracy although I only made it ~30 seconds before gamescope crashed without me really doing anything.
@matte-schwartz
I took a look of the callstack of the crash which tracy conveniently automatically records.
Looks like it crashed inside update_wayland_res()
, so I pushed out a new commit that comments out the one line I had recently added to that function.
Let me know if it's still crashing for you
@matte-schwartz Is this issue still acting the same, after issue https://github.com/ValveSoftware/gamescope/issues/1398 was fixed?
@sharkautarch no change, that was a different LE Deck specific issue with only those 3 refresh rates. This issue happens on all my devices, including the LE Deck.
@matte-schwartz Hmm I’m still a bit unsure about why commit 299bc34 is causing this issue… Could you revert that commit, and then, with that commit reverted, profile it w/ tracy, including toggling adaptive sync on and off during the profiling session? That way we have a baseline to compare the previous traces against
3.14.28 with a revert: adaptive-sync-with-revert.tracy.zip
3.14.28 without a revert adaptive-sync-no-revert.tracy.zip
actually @sharkautarch, on closer analysis, I am starting to wonder if this revert is just hiding a different adaptive sync issue... let me explain my though process on this
The Steam Client beta for Steam Deck has a new blur effect on the left side menu when you invoke it from gamescope-session. This blur effect makes gamescope start compositing when it's opened. On 3.14.28 with the revert, I notice that opening this side menu tanks frametiming in a similar way to the bisected bad commit here made it happen with the entire Steam UI - just by toggling adaptive sync.
The fact lag happens under different conditions (i.e. even with gamescope compositing unlike the issue report) just by toggling adaptive sync makes me suspect perhaps we are barking up the wrong tree here... and I wonder if it's actually something to do with adaptive sync itself?
My repro steps with the revert from SteamOS 3.7:
gamescopectl adaptive_sync 1
to enable adaptive syncadaptive-sync-side-menu-with-revert.tracy.zip
and then a fun slow-mo video clip since this is kind of tough to explain without showing:
https://github.com/user-attachments/assets/9ad7132a-85ea-4686-8bad-3d528bc2765a
You can see it slow down considerably when I send the adaptive sync on signal ~0:07 in to that clip. Let me know what you think. Might just be grasping at straws here but the similarity in how adaptive sync makes this other issue happen is suspicious to me.
@matte-schwartz Hmm the trace for the side menu thing definitely looks similar to the other traces...
Try doing one trace running gamescope with the env variable GAMESCOPE_DISABLE_TIMERFD=1
then also try doing another trace where you pass the arguments --backend sdl
to gamescope
GAMESCOPE_DISABLE_TIMERFD=1
- no real difference I could tell: adaptive-sync-timer-fd.tracy.zip
--backend sdl
doesn't launch, it throws Aug 08 11:19:03 steamdeck gamescope-session[28424]: SDL_Vulkan_CreateSurface failed: Vulkan couldn't find a predefined mode for that window size and couldn't create a suitable mode.terminate called without an active exception
and then coredumps when added in /usr/bin/gamescope-session
edit: going to try poking at some older versions of gamescope to see if I can find one where the issue is still reproable while also having a functional SDL backend. can't be earlier than 3.14.2 though. nothing i'm finding since the --backend
command was added
Things look okay for now, let's start tinkering.
it starts to go a bit haywire here, with seemingly rhythmic switching between delta flight timing on the page flip handler. this brings my Home page ui from 120fps down to ~90fps on my ROG Ally
now this seems to send every delta consistently to around double what it should be, and brings my UI performance on the Home page down even further to ~70fps.
this brings deltas back into the range of exactly where they should be.
this also brings deltas back into an acceptable range, even when mangohud is unhidden.
So basically, the best that I can conclude is that something about adaptive sync and the way the Steam UI utilizes window focus is what's actually causing problems here... what I'm lost on is exactly where this could be happening when kernel drm debugging logs don't really show anything noteworthy. perhaps why exactly this does work, albeit briefly, until you interact with the side menus or happen to get a steam notification offers a clue?
edit: oh, and deltas go back to the ~8332000 range under every circumstance if you force compositing
this seems to be fixed with the changes since 3.15.0 today on all devices I've tested on. will report any new issues that pop up separately.
this issue has not gone away after all.
https://github.com/ValveSoftware/gamescope/issues/1513 is a potential duplicate of this, as the same base commit is what causes errors to arise
overplay planes seem to pull away VRR focus between the overlay and the primary plane when in-game now. this issue can be mitigated with gamescopectl adaptive_sync_ignore_overlay 1
my latest findings on this are that now, regardless of compositing status, having any mangoapp overlay displayed in embedded gamescope while VRR is enabled will cause erratic frametimings and delta times to become staggered. This is especially visible on monitors where you can use an on-board OSD to verify the refresh rate the monitor is lighting up at, and in games where performance does not reach the maximum monitor refresh rate. With mangoapp on display, you can see the OSD switch between 120hz and the game's actual refresh rate. The issue is mitigated with adaptive_sync_ignore_overlay
or with adaptive_sync_overlay_cycles
and greater than 1 cycle.
I'm beginning to wonder if mangoapp itself is now causing VRR issues, although I'm not quite certain how to debug that... in any case, gpu-trace really does not show anything too unusual beyond staggered delta times between when VRR is in use vs not in use.
gpu-trace in Ghost of Tsushima with overlay visible and VRR on: https://drive.filen.io/d/53bb9723-bf4e-45a6-b810-ff026ce96937#vgb4tIwSZRyoR7vVFMPLl9pl5K4FMO29
gpu-trace in Ghost of Tsushima with overlay disabled and VRR on: https://drive.filen.io/d/e8a48d41-022b-4dc1-9c10-fdd87ab90db1#nSCE5bI7ip31uVqAUhbnmbpNeWYCuM3u
after spending many hours debugging this, most of this issue is honestly a pretty inaccurate summary of the real problem. will file a report for the actual bug instead.
299bc34 causes the Steam UI to lag when gamescope is not actively compositing, as determined by the compositor debugging squares (enabled the entire time) with MangoHud overlay. unfortunately, this issue is unrelated to the other gamescope compositing issues I reported because of course life can't be that simple 🐸
Before reverting 299bc3410dcfd46da5e3c988354b60ed3a356900 - stutter and ~40fps (from 6f4bc2e7a2cc2623bae6c26de72ab26e63ffa329):
https://github.com/ValveSoftware/gamescope/assets/136293710/d6a7e85a-465e-48f3-bb9a-e0a420aa813a
After reverting 299bc3410dcfd46da5e3c988354b60ed3a356900 - no stutter and 60fps (from 6f4bc2e7a2cc2623bae6c26de72ab26e63ffa329):
https://github.com/ValveSoftware/gamescope/assets/136293710/5e75a710-b51e-472f-ad67-2d946e937453