BotBlake / pyTAB

Python Transcoding Acceleration Benchmark Client made for Jellyfin Hardware Survey
GNU General Public License v3.0
9 stars 8 forks source link

Incorrect results being reported. #20

Closed mcarlton00 closed 1 month ago

mcarlton00 commented 2 months ago

Disclaimer:

Checklist

Description

Provide a clear and concise description of the bug. Multiple commands run under each test, however there are only results for one reported to the server.

Steps to Reproduce

  1. Run the thing

Expected Behavior

Each test should have one and exactly one result.

Actual Behavior

The tests reported by the server contain multiple commands. These commands are not equal, and appear to be trying to encode to different target formats (ie: one tries to encode to h264 and the other to h265). Then the results section only has the results of the most recent run. This seems to completely invalidate the purpose of having a test, as not only are two completely different capabilities being tested, but then we're only reporting results for one of them.

Environment (important!)

Additional Context

Live run output:

Running test: d5c7f3fe-09ea-3572-f81a-7f33b3d75ab0
> > > Current Device: amd
> > > > Workers: 1, Last Speed: -0.5
> > > > Workers: 6, Last Speed: 5.296
> > > > Scaling back to: 5, Last Speed: 0.8728617948717949
> > > > Scaleback success! Limit: False, Total Workers: 5, Speed: 1.0502419913419914
> > > > Failed: ['performance']
Running test: d5c7f3fe-09ea-3572-f81a-7f33b3d75ab0
> > > Current Device: amd
> > > > Workers: 1, Last Speed: -0.5
> > > > Failed: ['generic_ffmpeg_failure']

I added an extra debug statement to print the current test ID. Can clearly see that this test ran twice, indicating two "commands". The first one (encoding to h264, presumably) reported a max stream value of 5. The second one (encoding to h265, presumably) failed entirely.

output.json:

 {
    "id": "d5c7f3fe-09ea-3572-f81a-7f33b3d75ab0",
    "type": "amd",
    "selected_gpu": 0,
    "selected_cpu": null,
    "runs": [
        {
            "workers": 1,
            "frame": 900,
            "speed": 5.296,
            "time_s": 5.509,
            "rss_kb": 295244.0,
            "avgFPS": 158.8
        }
    ],
    "results": {
        "max_streams": 1,
        "failure_reasons": [
            "performance"
        ],
        "single_worker_speed": 5.296,
        "single_worker_rss_kb": 295244.0
    }
},

However here in the output.json that gets uploaded to the server, my max streams are being reported as 1. We've lost the max of 5 from the first run.

Possible Solution

(Optional) Suggest a fix or the cause of the bug. IMO, this is more a design flaw of the system, and not an implementation specific problem with pyTAB itself. Each test should do exactly one thing, I'm not sure why it seems to be designed to do two unrelated things. Having multiple "commands" under the same test ID ends with results getting overwritten, which invalidates the results and makes all the results currently in the server suspect at best.

BotBlake commented 1 month ago

Server side changes where made to ensure each test gets its own Test-ID. This should solve this issue, however this requires further Client testing to see if any code needs to be corrected.