Closed caioraphael1 closed 3 hours ago
Yes, that's the case for me too
Please consider stripping down one of the projects down to a reasonable minimum that still displays the regressive behaviour, or if you can build from source, bisecting it, otherwise it is very difficult to locate the regression source.
I've removed the DTLS certificates and reduced the project to a minimum. The bug still happens.
I can reproduce this in an empty project. Here is what I got from VS profiler:
OS_Windows::delay_usec
, previously it use Sleep
, now it use a loop wait. Tested locally, this (df3367f334
) is the first commit it appear.Profile Data: busywait.zip
Copying relevant snippet from the PR:
"Still, this is the most precision we can possibly get here.
The problem with this is that spin-locking eats up CPU time and does absolutely nothing. This time could have been better used by other threads that actually have usefull work to do. Or if no other threads are running, the CPU core could have been switched into low-power mode in order to save power and battery life on mobile systems."
This trade-off of CPU usage for precision is likely expected, but perhaps it would be sensible (if possible) to have a project setting to toggle implementations?
This trade-off of CPU usage for precision is likely expected, but perhaps it would be sensible (if possible) to have a project setting to toggle implementations?
I don't know if the change in this PR is vital, but at least I don't think this should be enabled by default. Opening 2 simple projects editor and running both is making me go to 100% CPU consumption. I can't actually work on any of those projects, it's extremely laggy.
Doing a quick search and I understand the situation. Win 10 has Waitable Timer which is efficient and precise, but we need to support Win 7, so a loop with YieldProcessor()
is used, hoping it will release CPU. (seems not work on my computer)
I'd suggest use Waitable Timer for Win 10 and above, use busy wait only for Win 7.
Doing a quick search and I understand the situation. Win 10 has Waitable Timer which is efficient and precise, but we need to support Win 7, so a loop with YieldProcessor()
is used, hoping it will release CPU. (seems not work on my computer)
I'd suggest use Waitable Timer for Win 10 and above, use busy wait only for Win 7.
@caioraphael1 In case you need it, this is master branch with #99178 reverted: https://drive.google.com/file/d/1fjvpnflQ-taHSkgLCZX86xfz9bQzbICv/view?usp=sharing
@YYF233333 Thank you!!! CPU consumption is now at 25-35%, under the same conditions.
@YYF233333 The results are very interesting. In theory the change in https://github.com/godotengine/godot/pull/99178 should never spin with YieldProcessor()
for more than 1.2 ms per frame.
Although, now that I am looking at the usages of delay_usec()
, it seems like the audio driver often tries to delay by exactly 1ms. With this change, instead of sleeping for 1 ms, we spin for 1 ms because the threshold is 1.2 ms now.
I guess the proper solution will be to lower the threshold so that we actually sleep in such cases
What do you think @mrsaturnsan?
In theory the change in #99178 should never spin with
YieldProcessor()
for more than 1.2 ms per frame.
Actually, no more than 2.02 ms (you need to sleep at least 1 ms).
~And I think the threshold is problematic. The precision of Sleep()
is about 15.6 ms by default, means Sleep(x)
actually sleep between x ms ~ x + 15.6 ms. So if you want to sleep 15ms or shorter, you cannot afford even once Sleep()
. That tolerance
should be 15000(us)
.~
Ok, I see that timeBeginPeriod(1)
, my bad.
If precise sleep by default is causing too much spinning on the CPU, my change can be reverted and the precise sleeping can be moved to the FPS limiting code. Reason being that changing the threshold will throw off the frame-limiting and give inconsistent FPS again.
I think that should be a good balance?
Alternatively, we can add an imprecise and precise sleep function. Places that don't need accuracy can use the imprecise sleep.
Actually, it might be interesting to revisit where delay_usec is getting called from.
@clayjohn What do you think about making a precise_delay_usec and a delay_usec function?
Although, I'm not sure why you are seeing such high CPU consumption. Does your architecture not support yielding?
Regarding YieldProcessor:
Signals to the processor to give resources to threads that are waiting for them. This macro is only effective on processors that support technology allowing multiple threads running on a single processor, such as Intel's Hyperthreading technology.
https://learn.microsoft.com/en-us/windows/win32/api/winnt/nf-winnt-yieldprocessor
Created a PR for fixing the unintended CPU usage cost: https://github.com/godotengine/godot/pull/99656
I add that the problem only occurs from dev5. I used dev4 until the other day and there wasn't this huge CPU consumption with the previous build.
By the way, @mrsaturnsan I'd like to ask what's the original purpose of #99178? It doesn't link to an issue, and I haven't ran into problem with framerate, looks a bit odd to me. Why we need so stable frametime?
Why we need so stable frametime?
Unstable frametimes result in jittery motion. VRR displays can attenuate this somewhat but you get VRR flicker when frametimes are unstable, due to the gamma curve constantly changing ever so slightly as the monitor adjusts its refresh rate.
Ideally, you should strive for the frametime graph to be a perfectly flat line, no matter whether it's limited by V-Sync or a FPS limiter.
Low-latency approaches like NVIDIA Reflex or NULL[^1] intentionally introduce some amount of microstuttering to reduce latency as much as possible, but this is a compromise that's best suited to VRR displays (which not everyone has) and competitive games.
[^1]: NVIDIA Ultra Low Latency mode (affects Direct3D 11 games when the low-latency mode is set to "Ultra" in NVIDIA Control Panel).
@YYF233333 Frame-rate locking was not stable. For example, locking to 30FPS on a 60Hz+ display would give an unstable frame-rate and cause jittering. A smooth locked 30FPS is a much better experience than an unsmooth 30FPS.
The previous sleep logic was not accurate, and was miss the refresh interval leading to the jittery frame-pacing.
I guess the proper solution will be to lower the threshold so that we actually sleep in such cases
Back with test results, run 10k times for each case, Period
is the delay specified, and Tolerance
is that param in the code.
Worst case seems to have no difference, for P99 you can see that max error is 1ms - tolerance
.
@YYF233333 Can you confirm if the issue is fixed with my PR?
@YYF233333 Can you confirm if the issue is fixed with my PR?
Yes, #99656 fix this.
Tested versions
v4.3-stable to 4.4-dev5
System information
Ryzen 3300x, 16gb RAM, GTX 1660, Windows 10, installed on a SSD
Issue description
I have some extremely simple projects to open and join a High-level multiplayer server with ENet. Both projects only have a _ready function to connect to each other and are set in Compatibility Render.
In v4.3, the editor and the app running both consume ~0.1% CPU, as shown at the top of the screenshot.
Meanwhile, in v4.4-dev5, each project consumes 15-18% CPU while in the editor or with the project opened, for a total of 30-36% CPU consumption if considering the editor and project together. This is shown at the bottom of the screenshot. If I tried exporting the project in v4.4-dev5 for win10, the consumption of the exported app is the same.
I downgraded the projects back to v4.3-stable and the consumption dropped back to 0.1%.
This is happening in 4 different projects, with a mix of Forward+ and Compatibility.
I have to keep some of my projects in 4.4-dev5, as I new some of the new features for my network setup.
Steps to reproduce
Upgrade to 4.4-dev5
Minimal reproduction project (MRP)
The projects contain sensitive data.