Add automated benchmarks in CI pipeline

fako1024 commented 1 year ago

In order to track side-effects of changes on performance it would be enormously helpful to have automated benchmarks / comparison via benchstat as part of the CI pipeline (maybe not on each commit, but e.g. on filing a PR). Of course this is inherently difficult (because of reproducibility or rather lack thereof), but I've seen other projects do it.

DoD

[x] Check for references and / or existing solutions for Github pipelines
[ ] Enhance CI with solution
[ ] Add ruleset / notifications (e.g. if geometric mean and / or other metrics indicate a significant change)

fako1024 commented 1 year ago

Options:

Additional info / documentation:

fako1024 commented 1 year ago

This is becoming more and more of a PITA... In general, automating the benchmarks is more or less trivial (in theory we wouldn't event need one of the marketplace "apps", a trivial bash script checking out head and the current version, running the benchmarks twice and comparing them via benchstat would do the trick), BUT: achieving reproducible results, even within a single contained run, is near impossible, even if using my own runner (simply because even on my servers, other stuff is running in parallel). Since we're not interested in 200% regressions but rather any deviation on the order of a few percent, we'd need:

Significant (literally) runtime (not really an issue since we won't run these on each commit, but probably only on PR where a human review / element is involved anyway)
A pristine / dedicated system / machine (not VM or Docker container) that can be configured to run these benchmarks and nothing else (and only one at a time)

Setting up such a machine isn't an issue (I even have old devices at my disposal that would suffice), but keeping it running all the time is something I'm not too happy about in my home environment (dependency on that being up / available + cost). I see two options:

Setup some magic that will start up / stop the instance from the Github workflow (shouldn't be too hard as I already have an API exposed for stuff like that) on demand (wouldn't solve the issue of availability though and introduce more complexity).
Find some environment where we can run a machine 24/7 and setup a runner on it (probably for both goProbe and slimcap).

@els0r All things considering, do you think OS would "sponsor" an old device (or alternatively / at least a network port + power) for us to use in the projects? I know that the "friends-zone" was removed a while ago, but this would be in the interest of OS itself I guess.

And BTW., I do understand that this seems like overkill at first glance, but the thing is: We're currently blind to performance regressions (unless someone manually runs the benchmarks, which is still difficult to do in a reproducible manner and requires a machine that is at least completely idle for quite some time), both explicit and implicit (from dependencies) and the more people contribute, the more difficult this will be. So I'd not only see this as a means to automate benchmarks but also as one to run them in the most consistent way possible / at all.

fako1024 commented 1 year ago

As discussed: I'll setup a prototype machine on my personal infrastructure, optimize it for running benchmarks and fire up a GitHub runner on that. Once we've established this actually works out we can think about next steps.

See also details / progress here: https://github.com/fako1024/slimcap/issues/78

els0r commented 1 year ago

As discussed: I'll setup a prototype machine on my personal infrastructure, optimize it for running benchmarks and fire up a GitHub runner on that. Once we've established this actually works out we can think about next steps.

See also details / progress here: https://github.com/fako1024/slimcap/issues/78

Thanks for digging in. In this scenario, scripting the full setup and teardown probably makes sense. Let's see how it goes.

els0r / goProbe

Add automated benchmarks in CI pipeline #238