Closed MasonM closed 1 month ago
@MasonM This is on my TODO to test out and approve.
Not really related to this PR: It would be nice if the benchmark results were a displayed on PRs as a comment so reviewers can more easily review performance impacts of the PR, it would be some effort to properly get this into CI I'm guessing.
Not really related to this PR: It would be nice if the benchmark results were a displayed on PRs as a comment so reviewers can more easily review performance impacts of the PR, it would be some effort to properly get this into CI I'm guessing.
I've mentioned having "performance regression tests" before (in https://github.com/argoproj/argo-workflows/pull/13566#issuecomment-2336734101 and elsewhere) that would effectively check for, say, 10% or more drops in perf (note that some variability should be always expected as well, as can happen even with the same code on the same machine). Automated comments get missed pretty easily (see CodeCov comments that weren't used in the past) and are non-blocking, so seem insufficient as such. An alternative would be like #13661 , to run nightly and open an issue if there's a big enough drop.
@agilgur5 That's a good idea, but I don't think that's feasible using the Github-hosted action runners due to noisy neighbors. We frequently see spikes in I/O latency during jobs, which is why I increased the timeout in https://github.com/argoproj/argo-workflows/pull/13769. Those spikes are likely to affect benchmarks and cause them to falsely report large performance regressions.
There's not much we can do about noisy neighbors. If we had access to a self-hosted runner, then we'd have more options.
We do have that problem, but afaik, most tests run within reasonable bounds and we rarely increase the timeouts, usually only due to adding things to the tests or CI jobs themselves (like the new freeing up disk space step as you mentioned in #13769).
So if the frequency is low, I think it would be a worthwhile trade-off to have an occasional test flake that needs rerunning if we prevent performance regressions in doing so.
Perhaps the worst case would be a test that only checks for less than 2x the average run time, which wouldn't catch smaller regressions, but would still catch the worst offenders
A second problem I forgot to mention is ensuring the baseline results are generated by a runner with comparable performance characteristics to the runner used for the PR results. To be more concrete, imagine you implemented this by having a job run on commits to main
that runs the benchmarks, then uploads the results as an artifact called something like baseline_results.txt
. Then, a PR job will run the benchmarks, download baseline_results.txt
, and compare the two with benchstat
.
The problem is the runner used for main
could have totally different CPU, I/O, and memory performance than the runner for the PR job. That means the results aren't comparable, and there's no good way of detecting when this happens, except by using self-hosted runners.
Here's an article I found with some numbers: https://aakinshin.net/posts/github-actions-perf-stability/
Unfortunately, default GitHub Actions build agents do not provide a consistent execution environment from the performance point of view. Therefore, performance measurements from different builds can not be compared. This makes it almost impossible to set up reliable performance tests based on the default GitHub Actions build agent pool. ... CPU-bound benchmarks are much more stable than Memory/Disk-bound benchmarks, but the “average” performance levels still can be up to three times different across builds.
I found this: https://github.com/cncf/cluster
The CNCF Community Infrastructure Lab (CIL) provides free access to state-of-the-art computing resources for open source developers working to advance cloud native computing. We currently offer access to both x86 and ARMv8 bare metal servers for software builds, continuous integration, scale testing, and demonstrations for the CNCF projects.
That would be perfect for this. @agilgur5 should I go ahead and submit a request?
Do you think the CNCF GitHub action runners would be consistent enough? https://contribute.cncf.io/resources/newsletters/2024-02-21-projectnewsletter/#github-action-runners
@Joibel It could, but I don't know if https://github.com/argoproj/ is "part of the CNCF’s GitHub Enterprise account". Do you have access to https://github.com/argoproj/argo-workflows/settings/actions/runners? That should be the URL to manage self-hosted runners. It requires admin access to the repository.
I don't know if https://github.com/argoproj/ is "part of the CNCF’s GitHub Enterprise account"
Not currently. Crenshaw (CD Lead) actually opened a Service Desk ticket to get access to them in late July: https://cncfservicedesk.atlassian.net/servicedesk/customer/portal/1/CNCFSD-2396 (the link is only accessible to maintainers). Summarizing, apparently CNCF hit some limits and then they were increased by GH and Argo was supposed to be added but still hasn't been, with Zach (Rollouts Lead) last asking for follow-up in mid-Sept.
I've now followed up there as well with a gentle nag - partially so that I'll get notification if the status changes.
Motivation
This adds basic benchmarks for listing and counting archived workflows so we can evaluate potential optimizations, e.g. https://github.com/argoproj/argo-workflows/issues/13601
Modifications
There weren't any other benchmarks in this repo that I could find, so these are the first. By default,
go test
doesn't run benchmarks unless you supply the-bench
flag, so I added amake Benchmark%
target for that.I had to make some adjustments to
test/e2e/fixtures/persistence.go
so it exposedWorkflowArchive
for these tests.Due to https://github.com/stretchr/testify/issues/811, we can't directly use
testify
to run these, so I used a workaround.Verification
Ran
make BenchmarkWorkflowArchive
locally after using https://github.com/argoproj/argo-workflows/pull/13715 to insert 100,000 rows intoargo_archived_workflows
: