MangoHud disagrees with UMR about GPU load

Venemo commented 3 years ago

I'm running Horizon Zero Dawn from Steam with latest mesa on an AMD RX 6900XT GPU. I use MangoHud to see the frame rate and some stats, because the game doesn't have this feature built-in.

In this game, MangoHud always reports a very low GPU load, which looks very suspicious. So I also looked at GPU load with umr --top and it strongly disagrees with MangoHud.

Consider this screenshot: https://drive.google.com/file/d/1WZZj9JQSKQ7zlcC8m922OJpQydmXg5y7/view?usp=sharing

On the left side you can see umr --top which reports that GPU_LOAD is 99%, and on the right you can see the game with MangoHud which reports GPU load at 0%. (Note that even though the screenshot shows 0%, MangoHud doesn't always report 0, it varies between 0% and about 40%.)

jackun commented 3 years ago

Check cat /sys/class/drm/card*/device/gpu_busy_percent

kokoko3k commented 3 years ago

I'm observing similar issues on my 5600xt /sys/class/drm/card*/device/gpu_busy_percent itself is showing wrong values, it is not a mangohud issue. That said, other utilities like radeontop or even umr give more reliable results by averaging multiple samples. According to https://www.kernel.org/doc/html/latest/gpu/amdgpu.html, it seems to be a firmware issue, so there is nothing the kernel can do about it. By now i'm doing my own workaround by using the following code:

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>

static int *values;
static int nsamples = 0;
static double timeout = 1.0;
static const char *fname, *foutname;

int loop(int i, int count) {
    FILE *fp = fopen(fname, "r");
    unsigned long int sum; 
    int j;
    fscanf(fp, "%d", &values[i]);
    fclose(fp);
    for (j = 0, sum = 0; j < count; j++) sum += values[j];
    double avg =  sum / (double) count;
    printf("%0.2lf\n", avg); //print continuosly
    //...but dump to a file everytime we start to fill a new window:
    if (i == 0 ) {
        FILE *fpout = fopen(foutname,"w");  
        fprintf(fpout, "%f", avg );//writing data into file  
        fclose(fpout);
    }
    usleep(1000000 * timeout);
}

int main(int argc, const char *argv[]) {
    if (argc != 5 || access(argv[3], R_OK) != 0) {
        fprintf(stderr, "Samples values from \"filename-input\" at rate \"samples-per-second\" and smooth them over a moving window sized \"window-size\"\n");
        fprintf(stderr, "dump continuosly on screen and less frequently to \"filename-output\"\n\n");
        fprintf(stderr, "Usage: %s <samples-per-second> <window-size> <filename-input> <filename-output>\n", argv[0]);
        return 1;
    }
    timeout = 1/atof(argv[1]);
    nsamples = atoi(argv[2]);
    fname = argv[3];
    foutname = argv[4];
    if (timeout < 0) timeout = 1.0;
    if (nsamples < 0) nsamples = 5;
    values = malloc(nsamples * sizeof(int));
    int i;
    for (i = 0; i < nsamples; i++) loop(i, i+1);
    while (1) for (i = 0; i < nsamples; i++) loop(i, nsamples);
    return 0;
}

./average_and_mirror 50 50 /sys/class/drm/card0/device/gpu_busy_percent /tmp/averaged >/dev/null

It samples the value 50 times per second, builds a sliding average out of it, and dumps the result periodically to /tmp/averaged. Mangohud picks it via:

    gpu_stats
    exec=cat /tmp/averaged

so that on the left i've the old value and on the right the averaged one.

Could this please be implemented in mangohud directly?

jackun commented 3 years ago

Try to build develop branch yourself for now.

kokoko3k commented 3 years ago

I've not tested it, because i've not an amd gpu right now. However, what about the following in gpu.cpp:

[..]
    if (amdgpu.busy) {
        rewind(amdgpu.busy);
        fflush(amdgpu.busy);

        /*
            int value = 0;
            if (fscanf(amdgpu.busy, "%d", &value) != 1)
                value = 0;
            gpu_info.load = value;
        */

        int i, sample, sum = 0, samples_max = 50 ;
        int sleep_time=int(1000000/samples_max/2);  // <-- spend at most 1/2 second to sample
        for (i = 0; i < samples_max; i++) {
            fscanf(amdgpu.busy, "%d", &sample);
            sum += sample;
            usleep(sleep_time);
        }
        gpu_info.load = int(sum/samples_max);
    }
[..]

Try to build develop branch yourself for now.

Wops, maybe i misunderstood, is there already some work done to address this issue?

kokoko3k commented 3 years ago

Indeed, it seems to work flawlessly in development branch, thanks.

kokoko3k commented 3 years ago

Tested a bit more and it still varies too much. In dying light, i'm facing the ground and gpu use still flies between 50% ant 70%, where radeontop si between 65 and 70. More samples needed?

jackun commented 3 years ago

MangoHud by default samples for 500ms 60ticks, radeontop 1sec 120 ticks. Probably why.

kokoko3k commented 3 years ago

Yeah, i was thinking.. why not ignore bogus values and build an average from other ones? even with 120 samples per second, the average seems not to be trusted. (?) Dying light, facing the ground, and menu screen, everything static:

koko@slimer# time for i in $(seq 1 120) ; do cat /sys/class/drm/card1/device/gpu_busy_percent ; sleep 0.006 ; done  | tr \\n "," 
99,1,1,3,98,6,97,6,74,99,77,92,66,99,1,1,3,98,24,83,24,87,49,80,99,67,99,1,99,12,99,3,3,12,93,12,58,99,84,99,75,99,16,97,7,97,6,96,24,89,49,65,97,3,98,3,98,2,98,3,98,12,95,24,84,99,1,99,1,99,3,97,6,96,24,89,99,1,99,1,99,3,3,3,98,12,93,99,18,99,35,99,1,99,3,97,10,67,99,1,98,55,99,33,99,1,1,6,97,6,83,49,75,98,60,99,1,99,1,1
real    0m0,924s
user    0m0,134s
sys     0m0,063s

You see in rougly one second and 120 iterations, there are about 30 values that are wrong, imho. I propose to fill an array with samples, build an average out of it, then strip values that differs too much from that average and rebuild another average without those. Or maybe we have to trust those values and the gpu is effectively sleeping here and there?

jackun commented 3 years ago

Or maybe we have to trust those values and the gpu is effectively sleeping here and there?

If the register says it is active/idle then it is or it's a hw/fw bug :shrug:

But gpu_busy_percent itself seems to be a best effort guess by SMU.

kokoko3k commented 3 years ago

So basically we are left in the cold. Partially (un)related, I was thinking about adding a feature to MangoHud to read a value from a specified file so that one can bypass useless cat(s) via the already provided exec function. Do you think it would be useful with chances to be merged? Also, I never made a PR, bear with me :)

Atemu commented 2 years ago

So what exactly is preventing us from using the method radeontop uses to determine GPU usage instead of gpu_busy_percent?

It's clear that the current usage is so wildly inaccurate that it's nearly useless. Radeontop's isn't however. Getting somewhere close to what it does would be an immense improvement on the status quo of the GPU usage indicator.

jackun commented 2 years ago

It already is though? (v0.6.6, thought it was already on v0.6.5, oh well.) Define "wildly inaccurate"? Seems pretty samey with Vega64.

Atemu commented 2 years ago

Oh yeah, it's actually gotten a lot better it seems!

However, I'm currently sitting in GW2's Lion's Arch where Mangohud shows usage in the upper 90s while radeontop shows a more reasonable 70-80%. Turning down resolution doesn't improve FPS, so the GPU is definitely not maxed out as Mangohud suggests.

I'm not sure radeontop is 100% correct here but Mangohud is wrong.

Atemu commented 2 years ago

Aha! Didn't see your edit in time.

0.6.6 is a lot more accurate and pretty much on-par with radeontop (if a little jumpy). The issue is resolved in my eyes.

flightlessmango / MangoHud

MangoHud disagrees with UMR about GPU load #463