Open ajnavarro opened 1 week ago
The options Execute Benchmarks on Pull Requests
can be reconsidered disabling the option: failOnAlert
Consider #2915
Run Benchmarks Only on Modified Packages: This approach requires further thought, as it might not be the most effective solution.
Yes, I'd rather we start off from having a small set of "core" benchmarks which are run on PRs.
Comparisons with master using benchstat would be nice.
Run a Subset of Benchmarks: Use the -short flag to execute a smaller set of benchmarks. While this won't completely prevent regressions in the master branch, it will reduce their frequency.
IMO this could also be just a list in the workflow YAML file; using -short
means that benchmarks are by default included, which is not ideal because I'll definitely forget it when I review a PR adding a benchmark. Instead, having a list of benchmarks ensures they have to be explicitly added.
The non-essential benchmarks we can disable with build tags or flags maybe?
Rest LGTM. :)
IMO this could also be just a list in the workflow YAML file
That might be an extra step that is difficult to know from all devs. That is how it was implemented before, you needed to add your benchmark in a yaml file. To reduce the hassle to the minimum, I had to:
To avoid repeating the same mistakes, I propose using a specific tag similar to -short
but used to indicate whether that benchmark will run regularly or not. It can be --ci
@thehowl WDYT?
I believe we should establish two levels of benchmarks:
The first level is enforced quietly and applies to all developers, including those writing contracts or fixing typos in a README. There should be no noise unless a PR triggers a global benchmark issue. In that case, the PR will become a topic of discussion regarding benchmarks. We could implement high-level benchmarks, such as running a blockchain node and processing N transactions. While this may not identify the exact cause of any slowdown, knowing that overall performance has decreased is sufficient for the core team to pause a PR until we investigate further. Developers should not have to check a box or worry about benchmarks; they should be able to continue their work without concern until a CI check or the core team blocks the merge for performance reasons.
The second level is for PRs that introduce features we want to benchmark from the outset, primarily in tm2 and gnovm, or for those focused on adding benchmarks rather than features. These PRs should prioritize benchmark configuration management, including defaults for CI and local setups.
However, upon reviewing the PR history, it’s unclear if we can conclude that everyone ignored benchmarks; most PRs were simply unrelated to them (case 1). Ignoring benchmarks is definitely more problematic in case 2.
A quick win that I suggest includes:
Let's summarize all the requests:
github-action-benchmark
. If none fit all requirements, consider developing a custom benchmark tool.-short
flag). I would rather prefer to specify a new flag, like -ci
. See my previous comment.Ping me if I missed something or some point needs more clarification/the output I got is wrong.
Working on it right now. Let me get a PoC ready and then we’ll figure out a specific rule to meet this need. :)
Description
Benchmark Results
Related issues:
Now that we have a Benchmark MVP working, I'd like to enumerate the requests from several parties regarding the next steps:
Tasks
[ ] Execute Benchmarks on Pull Requests We had to roll back this feature due to concerns from developers about the time it takes for benchmarks to complete. Here are some proposals to mitigate this issue:
-short
flag to execute a smaller set of benchmarks. While this won't completely prevent regressions in themaster
branch, it will reduce their frequency.[ ] Disable Non-Essential Benchmarks via Flags Some benchmarks, like those testing
goleveldb
(which are testing the database itself), don't need to run on every PR. Instead, we could focus on specific benchmarks that test particular cases. Here some examples:examples
package.Benchdata
performanceBenchmark Tools
Currently, two GitHub Actions can run Go benchmarks:
If none of these tools meets all our requirements, we should consider developing a new action fitting our use cases.
This is a call for comments (cc @moul and @thehowl )