fabiangreffrath / woof

Woof! is a continuation of the Boom/MBF bloodline of Doom source ports.
GNU General Public License v2.0
186 stars 32 forks source link

FPS limiter tweaks #1744

Closed rfomin closed 3 weeks ago

rfomin commented 1 month ago

I tested this with direct3d9 backend, resolution scale 100%, vsync off, frame limiter 200. With CapFrameX I've got a slightly better frame pacing with about the same CPU load. @ceski-1 I am interested in your results.

ceski-1 commented 4 weeks ago

Same results as here: https://github.com/fabiangreffrath/woof/pull/1728#issuecomment-2166013695

This PR compared to master...

Desktop (i9-9900K) Laptop (i5-2520M)
CPU usage and power Lower than master Same as master
Frame pacing Worse than master Same as master
Input lag approximation Same as master Same as master
rfomin commented 4 weeks ago

Same results as here: #1728 (comment)

Is this PR good enough? I'd rather not use Sleep(0)/yield as it's not recommended and doesn't change anything for me.

ceski-1 commented 4 weeks ago

Is this PR good enough?

I don't know yet. This fixes the inconsistent frametimes by forcing the timer resolution to 0.5ms, but I don't like relying on undocumented features:

diff --git a/src/i_timer.c b/src/i_timer.c
index d492e5a7..77a21ccb 100644
--- a/src/i_timer.c
+++ b/src/i_timer.c
@@ -27,6 +27,9 @@
 #ifdef _WIN32
 #define WIN32_LEAN_AND_MEAN
 #include <windows.h>
+#pragma comment(lib, "ntdll.lib")
+typedef LONG NTSTATUS, *PNTSTATUS;
+#define STATUS_SUCCESS 0x00000000

 HANDLE hTimer = NULL;
 #else
@@ -152,6 +155,14 @@ void I_InitTimer(void)
     }

 #ifdef _WIN32
+    NTSYSAPI NTSTATUS NTAPI NtQueryTimerResolution(PULONG, PULONG, PULONG);
+    ULONG min, max, current;
+    if (NtQueryTimerResolution(&min, &max, &current) == STATUS_SUCCESS)
+    {
+        NTSYSAPI NTSTATUS NTAPI NtSetTimerResolution(ULONG, BOOLEAN, PULONG);
+        NtSetTimerResolution(max, true, &current);
+    }
+
     // Create an unnamed waitable timer.
     hTimer = CreateWaitableTimer(NULL, TRUE, NULL);
     if (hTimer == NULL)

Also: https://github.com/tebjan/TimerTool

rfomin commented 4 weeks ago

@ceski-1 Does https://github.com/fabiangreffrath/woof/pull/1744/commits/f84edd8579cc7d54cae7b57b201e2cdf16c5bc2d improve frametimes?

ceski-1 commented 4 weeks ago

@ceski-1 Does f84edd8 improve frametimes?

No, it's the same. I should note that if the in-game limiter is disabled (framerate limit 0, vsync off), and then the Nvidia limiter is used instead, it has the same good frame pacing as the master branch, the same high cpu usage (so probably busy waiting as well), so maybe this is just the unavoidable trade-off for an accurate framerate limiter.

nvcpl

rfomin commented 4 weeks ago

maybe this is just the unavoidable trade-off for an accurate framerate limiter.

With high resolution timer we can try just this:

diff --git a/src/i_video.c b/src/i_video.c
index aa43762b..b6bfdf6b 100644
--- a/src/i_video.c
+++ b/src/i_video.c
@@ -792,10 +792,7 @@ void I_FinishUpdate(void)
                 break;
             }

-            if (target_time - elapsed_time > 1000)
-            {
-                I_SleepUS(500);
-            }
+            I_SleepUS(target_time - elapsed_time);
         }
     }
     else

Frame pacing is worse on my system. It seems that sleep is still not precise enough.

fabiangreffrath commented 4 weeks ago

Could you try the no-loop-unrolling hack from #1728 in this branch?

rfomin commented 4 weeks ago

Could you try the no-loop-unrolling hack from #1728 in this branch?

We only have a few instructions in the loop, so it doesn't matter if it's unrolled/inlined/cached or not, API calls (usleep or WinAPI timer) will affect performance much more.

Anyway, I tried, there is no difference for me.

ceski-1 commented 4 weeks ago

@ceski-1 Does https://github.com/fabiangreffrath/woof/pull/1744/commits/f84edd8579cc7d54cae7b57b201e2cdf16c5bc2d improve frametimes?

Actually, I retested this one several more times, as well as the one right before it. The CREATE_WAITABLE_TIMER_HIGH_RESOLUTION change in https://github.com/fabiangreffrath/woof/commit/f84edd8579cc7d54cae7b57b201e2cdf16c5bc2d actually does fix the frame times so now it matches the master branch but with lower CPU usage.

i9-9900K ![01_bars](https://github.com/fabiangreffrath/woof/assets/56656010/50a2dd50-b52b-432c-9ecb-f75d9fbabd99) ![02_master](https://github.com/fabiangreffrath/woof/assets/56656010/54a48074-1e75-42a3-a539-dd70671ccd30) ![03_gendlin](https://github.com/fabiangreffrath/woof/assets/56656010/0ae47f24-b758-4381-a9b3-0696f3fbc829) ![04_rfomin1](https://github.com/fabiangreffrath/woof/assets/56656010/8896021b-e0d2-4cfd-a8f0-634ac67bca7b) ![05_rfomin2](https://github.com/fabiangreffrath/woof/assets/56656010/7f8a007c-5e3a-4d98-891f-76625413811c)
i5-2520M (no change) ![06_bars](https://github.com/fabiangreffrath/woof/assets/56656010/3199f4d6-b442-410f-8c4b-07ee99c5c232) ![07_master](https://github.com/fabiangreffrath/woof/assets/56656010/a083ce02-96b6-41f3-a621-7f30c7c890b7) ![08_gendlin](https://github.com/fabiangreffrath/woof/assets/56656010/07d34ece-b5ea-46b2-9ed9-78a13570eec6) ![09_rfomin1](https://github.com/fabiangreffrath/woof/assets/56656010/0ed2375c-6919-4c4d-8a38-b59fdda332ba) ![10_rfomin2](https://github.com/fabiangreffrath/woof/assets/56656010/5c9f1c89-173c-450d-967a-6915bef34a9a)
rfomin commented 4 weeks ago

Actually, I retested this one several more times, as well as the one right before it. The CREATE_WAITABLE_TIMER_HIGH_RESOLUTION change in f84edd8 actually does fix the frame times so now it matches the master branch but with lower CPU usage.

That's great! Then I think this PR is ready.

@MrAlaux Could you please try the artifact from this PR on Win7?

MrAlaux commented 4 weeks ago

@MrAlaux Could you please try the artifact from this PR on Win7?

Sure thing, but I'm a bit busy at the moment, so it might take me a while to get to it.

What exactly do I have to check, though?

rfomin commented 4 weeks ago

What exactly do I have to check, though?

If it runs, it may fail at timer initialization. It will probably work, I will test it in Win7 VM.

rfomin commented 4 weeks ago

I have tested on a Win7 VM, it works now. I consider this PR complete, so if there are no objections I'd like to merge this.

MrAlaux commented 3 weeks ago

I finally gave this a try, setting the FPS limit to 45, and it feels like this PR is more responsive than the previous master. Needless to say, I didn't catch any errors, crashes and whatnot -- assuming that's what a "timer initialization failure" would be anyways.