Performance benchmarking

Security issue notifications

If you discover a potential security issue in s2n we ask that you notify AWS Security via our vulnerability reporting page. Please do not create a public github issue.

Problem

Similar to #1324 and #2582 : "Changes to core s2n library (eg. s2n memory functions, blob, stuffers, crypto) may impact/improve s2n performance. We want to be able to measure performance over time to test for potential performance regression which is important for applications that uses s2n in performance critical areas."

But we'd like a solution that could be applied to other projects - and using Rust.

Solution:

Add some tooling that can be run in CI to catch regressions.

Some high level comments/open questions from our last discussion:

tokio has overhead - might bypass for micro-benchmark, or we won't be measuring the right things
duplicate types of send/receive tests from sibling projects.
Consider using a iterate batch approach- expensive setup per test done all at once, then reset state in s2n so you can run the same test over and over
Consider dynamic memory allocation measurements
Consider peak memory usage measurments
Consider setup of an s2n connection, free the connection and look at memory usage
Would libc memory statistics help ? is this granular enough?
Consider collecting allocation traffic: calls to malloc/free tracking
After the handshake, are we wiping/cleaning up? something to track for memory optimization
Criterion points out it is not recommended for CI (faq ), review lai as a recommended addition.

High level tasks

Milestone 1 - Finalize benchmark approach

[x] Get Bindings functional
- [x] Make sure Binding Error Handling is working (and catching -1). Note about undefined behavior and missing enums
- [x] Plan for dealing with threading/forking safety (Can we do this all in Rust w/o unit test refactors?)
- [x] Create flamegraphs for each unit test
- [x] Review unit test flamegraphs and determine which unit tests are reasonable** benchmarks
[x] Decide if enough unit test benchmarks provide value
- [x] Alternative1: create targeted benchmark routines in Rust or C.
- [x] Alternative2: PoC Cachegrind/iai

Milestone 2 - implementation

[x] Create tooling that builds the bindings, applies any patches and runs them as part of cargo bench
[x] Create CodeBuild job that runs daily benchmark of main - write to external storage
[x] Create CodeBuild job that runs benchmark on PR update - pulling in main branch baseline metrics
[x] Create tooling that publishes the Criteron delta data to s3/GH pages and updates the PR
[x] Decide on pass/fail threasholds and implement failure logic to block on performance issues

Samples

Sample criterion reports

Requirements / Acceptance Criteria:

What must a solution address in order to solve the problem? How do we know the solution is complete?

A PR is blocked/flagged due to a performance regression (that is valid).
A dive into the longer term performance data shows identifiable trends.
RFC links: Links to relevant RFC(s)
Related Issues: Link any relevant issues
Will the Usage Guide or other documentation need to be updated?
Testing: How will this change be tested? Call out new integration tests, functional tests, or particularly interesting/important unit tests.
- Will this change trigger SAW changes? Changes to the state machine, the s2n_handshake_io code that controls state transitions, the DRBG, or the corking/uncorking logic could trigger SAW failures.
- Should this change be fuzz tested? Will it handle untrusted input? Create a separate issue to track the fuzzing work.

Out of scope:

Is there anything the solution will intentionally NOT address?

aws / s2n-tls