containerd / containerd

An open and reliable container runtime
https://containerd.io
Apache License 2.0
17.33k stars 3.43k forks source link

Benchmarking Proposal #7378

Open sbuckfelder opened 2 years ago

sbuckfelder commented 2 years ago

Problem Statement

There is not currently a consistent way to understand if commits are causing unexpected changes in containerd’s performance. Adding a benchmarking framework and automation mechanisms would allow the project to understand the performance implications of new code commits.

High Level Solution

Benchmarking Framework

We want the framework to produce metrics concerning the latency of different high level actions, such as start, stop, etc... Ideally, the framework will be generic enough that it can be packaged as a library and reused for subprojects (stargz-snapshotter for example).

Proposed Statistics

We want not only high level statistics such as the average, but also distribution information such as percentiles and standard deviation. This will give us confidence in the durability of the statistics and help us identify worst case scenarios. Proposed statistics:

Mean
Standard Deviation
Minimum
25th Percentile
50th Percentile (median)
75th Percentile
90th Percentile
Maximum

Automation Mechanisms

Ideally we would like this to run on code changes and then identify regressions via comparison to previous runs. This will allow us to see performance regressions between commits. To automate this we can use GitHub Actions as our starting point. A few considerations to keep in mind :

What Shall We Benchmark?

Ideally the benchmarks should be comprehensive across four dimensions: lifecycle steps, platforms, snapshotters, benchmark containers.

Lifecycle Steps

Platforms / Architectures

Snapshotters

Benchmark Containers

Proof of Concept Proposal

For a proof of concept build a simple version of the benchmark tool that only operates on a subset of the above dimensions. Lifecycle Steps: start Platform: Linux Snapshotters: overlayfs, devmapper Benchmark Containers: busybox

The resulting metrics will then be compared to the previous run and regressions will be called out via the GitHub Actions interface.

This will help us answer questions concerning the overall framework (how to create easily extendable abstractions) and mechanism (how to best use GitHub Actions)

Open Questions

kzys commented 2 years ago

Regarding hosting, we should try https://github.blog/changelog/2022-09-01-github-actions-larger-runners-are-now-in-public-beta/ as well.

estesp commented 2 years ago

Regarding hosting, we should try https://github.blog/changelog/2022-09-01-github-actions-larger-runners-are-now-in-public-beta/ as well.

Good point; I just read that post this morning as well. I assume our CNCF "enterprise"/OSS access will cover those runners, but we may have to dig into that. Definitely seems promising

dcantah commented 2 years ago

Do we want to run every PR? Which branches? Should we add a a tag similar ok-to-test?

In my head, the last portion makes sense. Simple example being we can make a reasonable guess on if a given change will actually effect perf at all before hand, and skip using the compute if it'd be irrelevant (someone just fixed a spelling mistake or made doc changes).

I think benchmarking would make sense to run on the currently supported release branches and main.