Closed coornio closed 2 years ago
Purely reporting and nopped out on linux/macos for the most part. The spinlock showed the issue is likely inconsistent wake times in the framelimiter.
Did anyone actually reproduce this on linux/macos?
on MSC / Windows, startCycle = __rdtsc(); @ https://github.com/PCSX2/pcsx2/blob/6f7890b709d5e3f7f5b824781e493455efc92339/common/src/x86emitter/cpudetect.cpp#L97
Might not be related to anything here but I might open a bug to consider changing MSC to QPC, or if willing to sacrifice Core 2's, RDTSCP. It is quite possible that RDTSC accuracy is impaired by Meltdown/Spectre and subsequent vulnerability fixes.
Did anyone actually reproduce this on linux/macos?
Don't think we've done so on Windows. Seems pretty specific to OPs setup which is why we're bouncing test cases to them.
Now that i've got access to both my systems,
I'm not seeing significant variation between my 5690x and 1680v2 systems running w7 and 10 respectively on my own builds either, it matches TheLastRar's observations really, mostly 50's with dips to 49.95. (PAL50 KH)
Certainly no drops beneath 49, excepting some transition loading screens like between save menu and the game being ready to play.
It's certainly not 49.99 like in the "good" builds coornio mentioned, but its not significantly impacting.
System A has exploit mitigations off, system B has all on but import optimization disabled.
To check if the issue is CPU power states / cache or just windows doing a bad job of waking us up on time, try adding this after the sleep u64 wake = GetCPUTicks(); if (wake > uExpectedEnd) { fprintf(stderr, "FrameLimiter: Woke up %ldµs too late!\n", ((wake - uExpectedEnd) * 1000000) / GetTickFrequency()); }
If someone's bored to make a build of it real quick, I can give that a try as well. Setting up visual studio will take a while otherwise to compile on my own, I typically rely on codeblocks project files otherwise in other projects I'm involved in :P
That said,
I'm not seeing significant variation between my 5690x and 1680v2 systems running w7 and 10 respectively on my own builds either, it matches TheLastRar's observations really, mostly 50's with dips to 49.95. (PAL50 KH)
It is rather odd how so far I'm still the lone exception, but then again I haven't seen anyone test this situation with a CPU closer to mine. the 1680v2 is a Xeon Ivy Bridge, while mine's a Kaby Lake. Shot in the dark, architectural differences (and potentially exploit microcode patches as mentioned previously) might be large enough to explain why you're not experiencing this.
It has gotten me rather curious however, so I got a Windows partition going on an external drive and I will test this fresh on there, see if it's actually better. If not, we'll at least know it's nothing caused by the OS somehow. If it is, please recommend spectacular ways to throw myself off a window and film it for everyone.
So, status update... I got my external drive with the backup W10 install on it (1803), waited for it to finish updates, copied over the pcsx2 folder, ran tests on 953, 955, waittime, spin versions.. all of them appear to behave "mostly" the same.
I was thinking that perhaps something might have changed from 1803 to 20H2 to explain why it appeared to work fine initially. I booted back with my main system drive again and I couldn't reproduce this bug all of a sudden.
However, a thought crossed my mind before posting this message and I turned up Firefox again, and boom the inconsistency has returned in full force like before. This is a very interesting finding, because on my backup system I did not have any other apps running, clean install and all.
This leads me to believe that whatever causes this is severely exacerbated by the presence of other applications actively making use of the CPU. For testing purposes, I isolated Firefox processes from making use of the same cores as PCSX2, effectively keeping them separate. The inconsistency was immediately reduced a significant amount, though it's still present. It still jumps around, but the dips that would fall to 45 frames now only go around 56, implying there's still some "bleed" despite the separation of threads. Closing Firefox entirely seems to reduce the inconsistency margins even more, to less than 1 fps fluctuations. The EE usage however remains consistently higher as opposed to builds with the dummy thread.
I should note that all prior tests were done with a bunch of applications running in parallel, the builds that I mentioned working super-well were not affected at all under this same "stress".
Not sure if this information helps in isolating this further. I'll revisit the backup system soon to give this another go for I may have messed up by not giving the CPU something more to do than just emulate.
It may be worthwhile seeing if Windows 10's Game Mode can influence that, it's supposed to prioritise games (which the PCSX2 will hopefully appear as).
I've made a build with the additional logging, specifically the code looks like this (but is otherwise mostly identical to 955)
if (msec > 1)
{
Threading::Sleep(msec - 1);
const u64 wake = GetCPUTicks();
if (wake > uExpectedEnd)
{
Console.Error("FrameLimiter: Woke up %I64d microseconds too late!", ((wake - uExpectedEnd) * 1000000) / GetTickFrequency());
}
}
If oversleeping is the issue then I would probably expect that message to get spammed when you start to dip.
Please post your emulog file when testing with this build.
Well this is certainly interesting. There's a lot of moments when the framelimiter wakes up too late, sometimes over 100μs late. That's basically just over 0.1ms so I'd say it's negligible. The unique part of this though is that these late wakes have nothing at all to do with the dips. They just happen randomly and do not match up with when the dips occur at all.
Here's the log: emuLog.txt
Thanks, That seems to suggest that the issue isn't due to inaccuracies in sleep, which also explains why the 2-6ms builds didn't improve consistency that much.
Did you manage to do a test with windows 10 Game Mode? bit of a shot in the dark bit it might help Windows focus on pcsx2. Another thing to try, If you are on a balanced power plan, to try switching to the High Performance power plan.
Yeah I'm not really sure what to make of this issue anymore because OPs information seems pretty inconsistent and contradictory.
2-6ms builds apparently incrementally improve things and a straight spinlock is being reported as perfect. Each one is increasing the margin of error for the sleep and the final straight up eliminating it all together.
The only thing I can think to say given that is it's the wait times, but if air's build isn't showing inaccurate wait time then I'm at a loss for what it is.
None of us seem to be able to replicate the issue on a pretty wide range of HW at this point as well as different versions of Windows 10 with and without mitigations.
Additionally, OP is reporting things that don't make any sense like changes in VU/GS percentages or improvements to uncapped framerates which shouldn't have anything to do with the frame limiter itself as ref pointed out.
I think it's time OP started investigating that this might be an issue with their setup or testing methods and not with the emulator itself.
@tadanokojin Forget what I mentioned about uncapped framerates. Even if my observation on that wasn't inaccurate, it's not the focus of this bug report anyway. As for inconsistent/contradictory information, I would indeed like you to point out why you think that is. I have neither reason to lie, nor am I losing my mind. It's very easy to see whether a particular test build is having dips and consistent framerates, the hard part is figuring out why exactly it happens.
I am quite certain at this point that there's something unique about the situation, but whether that's strictly limited to me due to some unknown circumstance or weird software "damage" of sorts on my setup, I can't tell for sure. Not yet.
I reported the differences in VU/GS usage in the event that they might provide some clue. For the most part they are tied to the erratic behavior of the limiter and by extension the erratic EE usage while the bug is in effect.
As for the testing methods, there's really not much to them. There's videos linked above that you can see the simplest and most straightforward approach I am taking to test this. The ease at which this behavior is reproduced on my end is laughable, what with it affecting all games I've tried so far.
Rant over the implied accusation that I'm pulling someone's leg or that I don't know what I'm doing/reporting aside, to answer @TheLastRar: Game Mode has always been on. My power plan was indeed on Power Saver previously.
Testing shows that while HWiNFO reported a 4.8 Ghz boost on all cores on that plan, the dips were still present. Once I changed to High Performance which forces the CPU to stay locked at that frequency at all times regardless of load, the consistency of framerate has returned.
EDIT: increased the refresh frequency on HWiNFO to have a better look at clocks. It seems like the downclocking of the CPU really is the cause of the dips. It appears that Windows might well be trying to balance the load too much and reaching an odd tipping point where the usage is too much for a low clock, but too low for a high clock, swinging back and forth each time and causing this inconsistency in the frame limiter.
EDIT 2: did yet another test with 953 -- despite the Power Saver power plan, the CPU is pegged at max frequency at all times. I am thinking that the presence of the spinning but useless net thread kept the CPU fed with useless cycles, keeping the frequency topped up each time and thus why it was not encountering the bug. In this case, build 955 basically exposed this behavior through an optimization meant to make the emulator more efficient. Damn.
What are your thoughts on this? Has anyone so far tried to employ maximum power saving on their intel system (so that it automatically clocks down as low as it goes when not in use) with dev 955+ or were you all previously locked in max cpu freq?
There was no implied accusation that you were pulling someone's leg or that you were lying. That's your uncharitable interpretation of what I said. There are all sorts of other explanations that don't require you to be acting in malice and I made no such accusation that you were. In fact I suggested the issue might have to do with your setup or testing methods which would infer the opposite, that the problem is real but perhaps not with the emulator or that you're unknowingly presenting us with information that doesn't make a lot of sense.
Please don't throw out accusations.
The thread was keeping the CPU active by spinning, therefore keeping your CPU clocked high. Just a side-effect of a bug that managed to benefit your system. Set your power plan to high (why not when emulating honestly) Maybe soon, maybe later the frame limiter will properly handle CPU downclocking. / my thoughts
The dummy thread was providing continues load which may have kept the cpu/windows in a higher power state. without it, the cpu would have downclocked as pcsx2 was waiting before the starting next frame.
refraction suspects that the frame dips are because we time how long we should wait based on cpu ticks, and our logic gets confused be the changing cpu frequency/tickrate.
One more thought from my last comment, the potential reason others did not come across this scenario despite testing an intel platform has to do exactly with the clocks the CPU itself is capable of. It's all about speedstep, and as @TheLastRar just mentioned, the swinging frequency throws the calculations out of whack.
I locked my CPU at 3.7 Ghz with all 6 cores active instead of the previous 4.8 Ghz. Power plan is still Power Saver (meaning it can clock down on its own if it wants to). Speedstep this time does NOT think the clock is high and so it's staying at 3.7 the entire time. If I raise the limit to 4.8 Ghz again, then Speedstep starts the swinging again and the framerate goes to hell.
I think we solved this. Technically still a regression (and best we don't assume people will lock their device on max performance), but in this case it's not a true fault of the emulator for trying to be more efficient, but Speedstep being overzealous in optimizing clock speed.
I will update the title of this bug accordingly as a result. What do you think we could do as a workaround in this case however? I'd like to stay on the power saving power plan, I am not comfortable leaving my CPU running at 4.8 Ghz all the time and I am sure to forget repeatedly to switch plans.
Original post edited to summarize the cause of this "bug" and the huge thread that followed about it, save you some time and brain power
refraction suspects that the frame dips are because we time how long we should wait based on cpu ticks, and our logic gets confused be the changing cpu frequency/tickrate.
What stepping/revision is the processor @coornio ? i want to check intels spec guide to see if theres an issue with Invariant TSC.
Also, have you done anything such as forcing the hpet clock to be the platform clock (not a good idea), see if bcdedit lists useplatformclock as true, if it does "bcdedit /deletevalue useplatformclock"
both my systems are Balanced profile and the cpu variates on both of them, so theres something in addition to what Ref is theorizing at play, but be aware that the Low Setting does more than just minimize the cpu state to short bursts of max rate.
and more
I believe low power also significantly impairs the behavior of Skylake+ Speed shift.
Don't mind the core voltage, not sure CPU-Z is detecting it right, it moves all over between 0.1V and 0.5V when in idle but HWiNFO shows it practically locked around 0.61V and I trust the latter more. I also did not find my CPU having any other stepping aside from A-U0 online.
As for the HPET, not sure if it's on yet, but I did use WinTimerTester and it reports 10 Mhz exactly, and it appears in the device manager as well. I will do my restart now to check my UEFI and verify whether it's enabled for real, along with the Windows bcdedit and update my post here.
EDIT: So it turns out that ASUS UEFI boards do not have an HPET setting, but it is actually enabled by default there. As such, I enabled it on Windows rebooted to test. Now it's up to 24 Mhz. The ratio is a stable 1.0000 in all cases (though initially it does appear to be fluctuating a bit over and under that point).
EDIT 2:
EDIT 3: Keeping the power plan on High Performance (max cpu freq) ensures the lowest times all around in terms of latency, understandably. Random peaks can and will occur when the cpu shifts the clocks down, but nothing nearly as bad as when HPET was on. I have since deleted that policy again, especially after seeing some benchmarks online and the issues it's causing.
You were correct however on the Power Saver setting being very aggressive. The huge dips I am experiencing are practically non-existent while in Balanced, despite superficially (setting-wise) the two plans being identical. There's still a small amount of play up and down, but it behaves closest to High Performance which locks the cores on max freq.
In the past I used to be on High Performance and it would allow clocking down, then one day it changed and wouldn't stop doing that until several reboots later and sticking with Power Saver (Balanced also was locked at max freq) so there was some definite Windows weirdness going on before which I have no idea if it's played any role in this.
At the very least, unless the framelimiter gets way smarter to account for these aggressive C states to keep up properly, the issue will remain unless the power plan is changed accordingly. I have found others through google nagging about Kaby Lake's C states causing hitches so this may well be something that's specific to this particular generation, with Speedstep/Speedshift behaving better on the others?
¯\_(ツ)_/¯
Going on a bit of a detour. Are you able to see the minimum clocks you CPU is drops to when PCSX2 dips? (It might help if you view it as a graph)
Yes, with 4 refreshes per second it's easy to see. The dips occur when the CPU downclocks to 0.8 Ghz (the lowest it can go even when idle). More specifically though, a single dip is not enough to imbalance the frame limiter, when the big dips happen (more than 10 fps lower), I notice that the CPU is basically going back and forth between lowest and highest clock multiple times in a row. If you prefer, I can look for some app that generates a graph for it.
Hwinfo64 log as a CSV and loaded into Generic Log Viewer
The parts where the frequency falls often to 0.8 Ghz in short order is where the worse/more prominent dips of framerate occur. This run is specifically on 955, with Power Saver plan for best effect.
Looking at the graph, It's possible that the CPU is underclocking so much that it effectively becomes too weak to run the game at full framerate, causing the dips.
Not sure how best to address that without having to use the higher performance power plan. You could potentially also edit the power plan and set the minimum frequency to a higher value, Don't know if that is something you want to try.
You know, that's a good idea to try. I typically let the CPU clock down as much as it's comfortable with but I could have a midpoint instead. I'll give that a try.
800 Mhz is indeed too little to be falling to when emulating, even if the test scene isn't very intensive. I can't quite understand what sort of logic allows for such sharp drops from 4800 to 800 and then back up. Intel's rules I suppose.
Possibly irrelevant (and maybe a bit late), but Windows CU KB5004296 (released July 29) resolved an issue with the power plans not clocking effectively, causing lower framerates in games. If setting the High Performance plan was resolving this issue, then that may have been a factor.
KB5001330 (released April 13) was the first public release CU that caused the issue with power plans.
Emulators were especially hit harder, granted it only applied to fullscreen (exclusive or borderless) and when vsync was enabled.
What's the status on this issue?
I think we solved this. Technically still a regression (and best we don't assume people will lock their device on max performance), but in this case it's not a true fault of the emulator for trying to be more efficient, but Speedstep being overzealous in optimizing clock speed.
This issue is resolved, use a more performant power profile.
Also: https://github.com/PCSX2/pcsx2/pull/4499 was created to prevent this again.
This part is an edit to summarize the results of all the research and back and forth that took place below. As the current title implies, #4214 introduced an optimization to the net code that basically stopped the net thread from running under all circumstances (when the emulator is running) and doing nothing. A very sensible approach to optimize the code and avoid wasted cycles.
This optimization has resulted in a regression/bug of sorts, though it's not truly the fault of the emulator and the kind people who sustain it, but rather a freak coincidence as it turns out.
Intel Speedstep/Speedshift is the technology of Intel CPUs to adjust the frequency of each core depending on its load, and also balance the max overclock frequency depending on how many cores are burdened with work. It is why you see clock tables for new processors that say up to 5.2 Ghz for example on a single core or 4.7 Ghz on all cores. AMD does the same thing through their own technology, though apparently it's less dumb.
The issue that arose in this particular situation, which, annoyingly, is very difficult to reproduce due to its requirements, is as follows. You start a game, stand on the side, and the frame limiter kind of freaks outs. The FPS counter is all over the place, falling by as much as 20% lower than proper.
This is where SpeedStep/Speedshift comes in. In its duty to regulate the core frequencies to ensure a lower power consumption where appropriate, it destabilizes everything. It basically comes down to having a particular frequency range and a particular core load where one moment it judges the load is insufficient for a high clock, thus it reduces the core frequency, and the next it judges the load too high for a low clock, thus raising them back up. This is most pronounced on the Power Saver plan, and much less evident in the Balanced plan. It effectively goes away when switching to the High Performance plan.
This swinging core frequency has been responsible this entire damn time for the inconsistent framerates the emulator was reporting and all the hitching it was experiencing as a result when trying to catch up to these frequency changes. It explains why the existence of the always-active net thread, the spinlock frame limiter variant, and a locked-core-frequency-to-max resolved the issue before, as they all eliminated this pendulum effect the cores were experiencing.
As for how to fix it.. tough to say. I had a brief talk with refraction and, well, in my opinion it appears unlikely something might be done about it. I suggested a hack of implementing a dummy thread (and a relevant setting in the emulator to enable it) as a means of giving users the ability for the emulator to keep the CPU wide awake, without having to change the power plan (or adjust OC freq) like I'd have to do, but that doesn't look desirable. Making spinlock an "official" workaround instead seems bad from my end because the EE counter would become useless, always being at 100%.
So it seems that for now I'll just mess with the OC on my end to work around this, and hopefully the knowledge of this very particular and odd situation will remain in people's memories so that they may advise someone else about it should they be so darn unlucky to follow in my footsteps, lol
Lastly, you might also ask why this took so long to figure out! Well, it was suggested, but wasn't all that obvious. The frequency changes were happening so fast but in regular intervals that the tools I was using were not catching them on time to compare with the framrate dips. When a low clock number was reported, it was so out of sync with the dip itself I just glossed it over, and so we ended up testing a bunch of things in vain -- not that I do not appreciate the effort and willingness of everyone to participate and try to narrow it down or reproduce it from their end.
I had to up the clock frequency update rate on the monitor to properly catch this in the act, at which point a few short tests confirmed the suspicion for good. I've always had a knack for coming across unusual issues when it comes to computers!
===================== ORIGINAL MESSAGE BELOW =====================
As the title implies, the emulator suffers from inconsistency in the frame limiter when it is enabled, with the fps dipping below the PAL/NTSC standard despite the emulator having actual leeway to go much faster. This, at present, exists all the way to the latest dev version with absolute certainty.
Reproducing it is easy. Pick a lightweight title, be it PAL or NTSC, and navigate to some spot you prefer. Ensure the title you picked can run well over 100% of the standard framerate when the limiter is off, to ensure your framerate isn't limited by the system performance.
Once you are in a suitable spot, either wait for a bit and monitor the framerate through the OSD or just move the camera around. During my testing, the FPS through the OSD is unstable. While versions from 953 and back are nearly rock solid at 59.94 fps (NTSC), 954 and later show instability with it fluctuating often between the 59-61 range, and occasionally dipping way lower, by as much as 1/3. Often in the 52 range, sometimes even lower. EE usage also seems to spike during those moments for whatever reason, but this behavior does not occur when the limiter is off and the game runs as fast as it is able.
It would appear that the changes merged from 954-955 somehow brought about this situation that affects all games I have tried so far. My testing platform for narrowing down the revision was the original Splinter Cell title, as it is very easy to start the game fresh and get into the game in its training area real quick, then hang from the first wall and just pan the camera around to check the limiter's consistency.
For reference, system specs are:
Windows 10 20H2 i7 8700K @ 4.8 Ghz RTX 2080
refraction told me to link #4214 here so I hope this looks right. If any pictures for reference are needed let me know and I'll procure a couple, or a video maybe.
Lastly, I should point out that while I do use MTVU, this was tested with it both on and off, and occurs across all renderers. I would assume it also applies to SW but I can't well run games at 60 fps that way so I can't tell for sure.