ValveSoftware / SteamOS

SteamOS community tracker
1.57k stars 70 forks source link

Super Mario Odyssey (Yuzu mainline 1638) poor frame rate with unlocked GPU frequency (default setting) compare to manual GPU frequency, AND worse performance with SMT enabled #1274

Open ingramli opened 10 months ago

ingramli commented 10 months ago

Your system information

Please describe your issue in as much detail as possible:

At default overlay settings, average FPS around 40, enable manual GPU frequency at 1600, average FPS increase to 47 GPU auto GPU 1600

Steps for reproducing this issue:

  1. Enter the game, start a new game
  2. Wait until intro cutscene is over, enter into controllable state, use right analog stick to look up until the viewing angle is maxed out, without making any further input to the controller
  3. compare frame rate by toggling manual GPU frequency on/off
safijari commented 10 months ago

Emulation is famously more CPU dependent than GPU dependent (Switch emulation in particular). If you look at the video I'm linking below, the GPU has been capped manually at 1000Mhz to allow the CPU more headroom. And they're getting better performance than you with the CPU clocking a lot higher.

Are you capping TDP? What's the total system power draw in both scenarios?

https://youtu.be/_8BUwWEYh-c?si=uOYmB8JxwmpliWmw

Note: Valve employees are unlikely to replicate an issue with emulation so this ticket might not be actionable for them. I'll try this if I can borrow my friend's copy of Odyssey and report back my findings.

ingramli commented 10 months ago

Replying to https://github.com/ValveSoftware/SteamOS/issues/1274#issuecomment-1837150498

The TDP was uncapped (15W), the CPU was using like only 2 or 3W regardless of GPU locked or not, therefore I believe TDP limit does not play a role here.

In addition, perhaps unrelated to this behavior, but following advise from another member in Steam discussion, using unofficial tool, the powertools from Decky to disable SMT, does yield significant performance improvement (with almost stable 60 frames with or without GPU frequency locked), the CPU is running at higher frequency, and eating up much more power compare to SMT enabled. It seems the fix implemented in 3.5.5, "Fixed an issue where certain workloads would exhibit severe CPU performance issues unless SMT was manually disabled." does not yield the intended improvement equivalent to SMT disabled under some situation....

safijari commented 10 months ago

https://github.com/ValveSoftware/SteamOS/assets/5191844/3a85e2b6-5b80-40f3-88fb-d97791e518a5

I can replicate your findings on 3.5.8. Now that the clock speed visual glitch is gone we can also see what the behavior is. With TDP limit off and clock speeds left uncapped, both the CPU and GPU clock down to lower than they need to be. Forcing a manual GPU clock seems to fix it but note the increase in CPU clocks at the same time. Uncapping GPU clocks then causes both CPU and GPU clocks to slowly ramp down until performance starts to degrade.

I installed power tools and disabled half the cores and this issue went away.

Tau5 commented 10 months ago

Can replicate while using Mario Kart 8 Deluxe on latest mainline

schM0ggi commented 10 months ago

Question:

Are you making use of the power limiters (for CPU and/or GPU) in PowerTools by chance?

ingramli commented 10 months ago

Question:

Are you making use of the power limiters (for CPU and/or GPU) in PowerTools by chance?

If you are asking me, for powertools, I didn't touch anything other than SMT, all settings, unless specified otherwise, are running at default configuration (For TDP limit which is not capped, is 15W), CPU and GPU could run at any speed, within the deck specifications.

schM0ggi commented 10 months ago

Okay. I'm just asking because PowerTools does seem to have a bug (since some months actually) regarding correctly loading profiles when used in combination with the SteamDeck original Clock and/or TDP settings.

I'm actually interested to see if I can reproduce a similar behavior on my side. But I don't have the game. I do have Skyward Sword HD in Yuzu for testing purpose, if you can provide a similar situation. Otherwise I have some emulation going on with gamecube, n64 etc. Gen 4-6 stuff. A PC game is also an option if we can find one we both have and provide a similar problem.

Sidenote: While SteamOS 3.5. does actually improve performance for emulation with enabled SMT (observed in gamecube), I personally still recommend to disable SMT, at least up to Gen 6, because 1. they do run well with disabled SMT and 2. other systems, especially n64 with ParaLLEl RDP which is CPU heavy, don't run well with enabled SMT, even with SteamOS 3.5, because the CPU clock doesn't, for whatever reason, go as high as needed with 8 core until one does set the clock manually, but even then I find the performance not as good as with disabled SMT with same clock speed.

ffrasisti commented 10 months ago

so the smt situation in emulation was never fixed?

Tau5 commented 10 months ago

It seems there was a regression because on 3.5.7 with default settings it doesn't have slowdown but on 3.5.9 it does

waspennator commented 10 months ago

so the smt situation in emulation was never fixed?

Apparently it was fixed for certain situations but not all of them.

schM0ggi commented 10 months ago

so the smt situation in emulation was never fixed?

The real question is: Is it something that needs a fix because it's some sort of a bug or it is what it is because of how the apu is designed and how (some emulators) interact with it? As far as I understand, there was some rather old and general bug in the 5.x kernel regarding AMD cpus and not something Steam Deck specific. It's quite possible, that this bug is fixed with the newer kernel, coming with 3.5, and you can observe performance gains in older, single core games/apps. But I didn't run specific tests for that.

Actually, one should be able to just easily check, how emulation (same emulator, same game etc.) behaves on a Linux AMD desktop machine regarding clock speed with a up to date kernel. I'm running one and I think I'll do that in the next days, I'm curios.