bheisler / criterion.rs

Statistics-driven benchmarking library for Rust
Apache License 2.0
4.52k stars 301 forks source link

Input on a json scema for bench reporting #693

Open epage opened 1 year ago

epage commented 1 year ago

In a theoretical future where test/bench binaries are encouraged to report back results to cargo as json which then formats and reports them, what would the criterion project see as needed for the json schema? If you can also speak for iai, that'd be great!

I've been considering how we can improve testing in Rust and one area I've found is the limited contract between cargo and test harnesses is a source of friction. I'm also hoping that this lens for looking at json reporting can help get the schema stabilized.

For reference, this is the current schema for benches.

cargo bench would not be able to cover every reporting need, so I would see us needing extension points in the schema and cargo criterion could be used when people need a more powerful or specialized workflow.

Issues like #130 would also be relevant.

bend-n commented 11 months ago

The bench scheme specified in #49359 is just

Bench {
    name: String,
    median: f32,
    deviation: f32,
    mib_per_second: Option<f32>,
}

Do you think thats enough information?

bend-n commented 11 months ago

Serde will, by default, ignore unknown fields, so I think criterion could just... expand that format a little bit?

epage commented 11 months ago

The main risk for expansion is conflict with future additions. We'd likely want to specify how future evolution should happen.

Currently, my plan (since t-testing-devex hasn't met yet) it to shift the responsibility for high quality reporting from the harness (librest) to the runner (cargo, cargo nextest, etc). I'd like to do the same for benches so we can get more mileage out of a common, stable bench harness with all of the fancy features being done in cargo, cargo criterion, etc. The main downside is that will require pushing a lot of data up per bench.

This also means that if there are things that would be more generally desired by a stable bench harness (throughput?), I'd like to consider that.

While this isn't as directly handled by criterion, so I don't know how much you can speak to it, I'd also like to include multiple types of metrics (time, icount, etc), so we'll need the schema to be able to handle that.