flightlessmango / MangoHud

A Vulkan and OpenGL overlay for monitoring FPS, temperatures, CPU/GPU load and more. Discord: https://discordapp.com/invite/Gj5YmBb
MIT License
6.33k stars 278 forks source link

Average FPS are useless. #1095

Open NoNaeAbC opened 1 year ago

NoNaeAbC commented 1 year ago

To measure the performance of a game or the hardware running a game the whole data needs to be compressed into one number for further analysis. Commonly this is done using the average FPS.

There are 2 units that are commonly used for stating the time between to frames(or short frame time): FPS and ms. While on the surface there is no difference because time [in FPS] = 1000 s/ms / frame time [in ms]. There is actually a difference. I'll demonstrate this using 2 examples: "I have a frame time of 2.5 ms for CS:GO on my PC, if I open up a new session on my laptop I get another 25 ms resulting in 27.5 ms to render 2 frames, this will make me a pro gamer". "I get 400 fps in CS:GO on my PC, if I open up a new session I get additional 40 fps, so I have 440 fps, this will make me a pro gamer". In both examples the exact same line of reasoning was used, but somehow having 440 fps on 2 displays doesn't result in an enhanced experience. This implies that there are operations that make sense for ms but don't for fps. I will argue, that avg FPS is one of those.

First I want to get my standpoint clear on which all arguments follow. We can't perceived smoothness above 30 fps, but that doesn't mean our eyes only fire signals every 30ms, they fire once they catch 9 photons and then they have a cool down. The thing that enhances the gaming experience is perceived latency. While 10 ms looks negligible in comparison to our 300 ms reaction time, you have to remember 2 things. 1) reaction time consists mostly of muscle contraction an figuring out how to react, while the time required for perceiving latency is lower. 2) Total latency includes the mouse and monitor reaction time, as well as CPU vs GPU time and double buffering. Usually for a fixed game engine and hardware this value is 3-6 times the frame time and 50 ms sounds much more perceivable.

There are 4 ways to take the average, let a_i be the frame time of the i'th frame and n the total number of frames. t shall be the total time. 1) FPS in respect to frame count (in units of fpms). AVG_1(a) = \sum{0 < i < n}{1/a_i 1} / \sum{0 < i < n}{1} = \sum{0 < i < n}{1/a_i} / n 2) FPS in respect to time (in units of fpms). AVG_2(a) = \sum{0 < i < n}{1/a_i a_i} / \sum{0 < i < n}{a_i} = n / t 3) MS in respect to frame count (in units of ms). AVG_3(a) = \sum{0 < i < n}{a_i 1} / \sum{0 < i < n}{1} = t / n 4) MS in respect to time (in units of ms). AVG_4(a) = \sum{0 < i < n}{a_i a_i} / \sum{0 < i < n}{a_i} = \sum{0 < i < n}{a_i^2} / t

Note that option 2 ant 3 are identical. On general properties for all of them is, that they are averages thus MIN(a) <= AVG_i(a) <= MAX(a). The third one is the most simple average (arithmetic mean) = total time / number of frames. From the generalized AM-GM inequality it can be shown, that [in ms] AVG_4(a) >= AVG_3(a) >= AVG_1(a). The first average is simply the harmonic mean, the fourth one is the the arithmetic mean weighted by the execution time. I have given the interpretation the third one already, the interpretation for the fourth one is the following: Imagine you're sampling random frames, what is the expected value for the frame time. The answer to this question is the third average, for the fourth average you have to replace "sampling random frames" with "sampling random time stamps". The interpretation for the first average is the most complicated one. You will need to compare 2 distributions with equal amount of frames in the same time. They only differ in the smoothness. The harmonic mean is a way to quantifiable prefer the less smooth one. For example you can make an upwards correction by having a quadrillion fps for a nano second. The fourth average doesn't provide you with that option.

Here is a python script to play around with it.

arr = [0.1,0.01,0.01,0.01,0.08,0.09,0.2, 0.15, 0.15, 0.02, 0.03, 0.04, 0.03, 0.08]

print(f"total time {sum(arr)}")

fps = [1/x for x in arr]

print(f"frame time in s : {arr}")
print(f"fps : {fps}")

avg3 = 0

for i in arr:
        avg3 += i

avg3 /= len(arr)

print(f"AVG_3(n) in s : {avg3}")
print(f"AVG_3(n) in fps : {1 / avg3}")

avg1 = 0

for i in fps:
        avg1 += i

avg1 /= len(fps)

avg1 = 1 / avg1

print(f"AVG_1(n) in s : {avg1}")
print(f"AVG_1(n) in fps : {1 / avg1}")

avg4 = 0
for i in arr:
        avg4 += i*i

print(f"AVG_4(n) in s : {avg4}")
print(f"AVG_4(n) in fps : {1 / avg4}")

Output

total time 1.0000000000000002
frame time in s : [0.1, 0.01, 0.01, 0.01, 0.08, 0.09, 0.2, 0.15, 0.15, 0.02, 0.03, 0.04, 0.03, 0.08]
fps : [10.0, 100.0, 100.0, 100.0, 12.5, 11.11111111111111, 5.0, 6.666666666666667, 6.666666666666667, 50.0, 33.333333333333336, 25.0, 33.333333333333336, 12.5]
AVG_3(n) in s : 0.07142857142857144
AVG_3(n) in fps : 13.999999999999998
AVG_1(n) in s : 0.02766190998902305
AVG_1(n) in fps : 36.15079365079365
AVG_4(n) in s : 0.12000000000000001
AVG_4(n) in fps : 8.333333333333332

Note that while half of the time the fps is below 7, the average fps still is able to climb up to 36 fps. Obviously AVG_4 is the only sensible one to measure performance. Especially, I can't imagine how someone can keep a straight face while talking about average FPS as good measurement in one sentence and how a smooth gaming experience is important in the next one.

Another impact to performance are lag spikes. I don't want to start discussion why the happen, just on how to measure them. I feel like the industry standard has become to use the average fps on the lower 1% fps. The more it deviates from the average the more lag spikes the game received. I want to classify lag spikes into 2 categories. The first category is the frame took longer to render than 30 ms this messes up with our perception of video. The second category is a smooth mouse movement is sampled inconsistently and turned into a inconsistent camera movement but the frame times themselves are consistent. Usually lag spikes in reality are not one or the other but instead a combination of them both. The problem is 1% lows can only measure the contribution of the first category. Thus 1% lows are a proxy. They don't measure the lag spikes but they correlate with them. The issue with proxies is, that you can say the game runs like mess, but you are unable to make qualitative statement.

The issue: While mangohud captures the average fps, it is a useless measurement. Using it can result in bad conclusions. This issue only occurs when comparing 2 frame time distributions where one is smooth and the other one isn't.

The fix: Use AVG_4 instead.

The problems with this solution:

Etaash-mathamsetty commented 1 year ago

I think this is a good proposal, but one that probably won't be enabled by default :)

mupuf commented 9 months ago

@NoNaeAbC: You are absolutely right that average FPS is meaningless on its own... but actually any "single" metric will always be somewhat useless and not indicative of the overall experience. The only reason you feel like you can is because you assume a model for the distribution of frame times... which is unrealistic. Sooooo, what do people do when they don't know the distribution of their samples? They use statistics: medians, percentiles, ...

As for using FPS or frame times. I agree that frame times feel more interesting, the reality is that humans have been bombarded with FPS as a metric and they associated it to a certain experience. So this is why reviewers ended up going for tracking 1% lows on top of the average. If anything, I feel like showing the median FPS along with the 1% lows would make more sense...

In any case, I believe that we can't change the defaults... but I am all for exposing more ways of representing the frame times! So feel free make a PR to add that!