Open jasonwilliams opened 4 years ago
I'm currently not a user of criterion (since I could not run it on my local macOS). However, as far as seeing your output example, we can extract the benchmark result and put remaining outputs to extra
field. Your contribution will be more than welcome: https://github.com/rhysd/github-action-benchmark/blob/master/CONTRIBUTING.md
Thanks @rhysd
I had a go here, not sure how to fill in range
https://regex101.com/r/8mrOSE/5
I suppose the range would be the high-value - low-value [low-value median-value high-value]
CC @bheisler
Hey, Criterion.rs maintainer here. The command-line output isn't really intended for programmatic access so the format isn't guaranteed to stay stable. I don't currently plan to change it, so this is more of a heads-up that you may need to fix it if it changes in the future.
Criterion.rs generates a file that is designed for programs rather than humans (see here for more on that) but it contains the raw measurements, not the statistics that Criterion.rs calculates from them, so if you wanted to use that you would need to decide on how you want to summarize those numbers. If you go that direction, I would recommend also reading up on the measurement and analysis process to clarify what some of those numbers mean.
Unrelated: @rhysd - if it's not too much trouble, would you mind explaining your difficulties with Criterion.rs on macOS, or perhaps opening an issue on the Criterion.rs repository? I don't have a Mac to test with but it ought to work.
@bheisler Thank you for the comment. I did not know that it supports machine-readable format. The format would be preferable.
if it's not too much trouble, would you mind explaining your difficulties with Criterion.rs on macOS
I'm sorry that I cannot remember clearly, but when I tried Criterion example at that time, the benchmark program crashed with SEGV. Since it was about 6 months ago and I used older macOS (10.12) at that time, now it might work fine.
OK, thanks. If you decide to try it again and run into the same problem, please feel free to file an issue.
I'm now trying to take over @jasonwilliams's work. I'll notify you if I face an issue. Thank you for taking care.
I tried to implement simple calculation based on the analysis process document:
const { linear } = require('regression');
const fs = require('fs');
const lines = fs.readFileSync('target/criterion/Bench Fib 10/new/raw.csv', 'utf8').split('\n');
const measurements = lines
.slice(1)
.filter(l => l !== '')
.map(l => {
const data = l.split(',');
const measured = parseFloat(data[5]);
const iter = parseInt(data[7], 10);
return [iter, measured];
});
let max = measurements[0][1] / measurements[0][0];
let min = max;
for (const m of measurements.slice(1)) {
const v = m[1] / m[0];
if (v > max) {
max = v;
}
if (v < min) {
min = v;
}
}
console.log('min:', min);
console.log('estimate:', linear(measurements).equation[0]);
console.log('max:', max);
and output was:
min: 160.11831801157643
estimate: 171.46
max: 196.80116265096436
Criterion.rs's output was
Bench Fib 10 time: [169.75 ns 171.26 ns 173.01 ns]
so I think there is a difference in calculation of estimation. And min/max seems wrong.
Hmm, I'm not confident to maintain the logic. After all I'm feeling using CLI stdout might be better. Though it may change without major version up of Criterion.rs.
CSV format document says
The results of Criterion.rs' analysis of these measurements are not currently available in machine-readable form. If you need access to this information, please raise an issue describing your use case.
so we might adopt some machine readable output of summary in the future.
so we might adopt some machine readable output of summary in the future.
Yeah, a good idea would be to have github-action-benchmark
accept the CLI output for now, and raise a separate issue for migrating to calculating the CSV logic, its a beefy task and should be worked on separately.
The CLI output format won't change any time soon, so we have time to do this properly.
Or criterion could add the calculated info to the machine accessable file. @bheisler What do you think?
Or maybe it could support the normal cargo bench
format with an option? We want to use this action in https://github.com/clap-rs/clap repo.
Hey folks. I've been thinking about adding better machine-readable output to this cargo-criterion
subcommand I've been working on. Specifically, this would include the calculated statistics, not just the raw measurements.
I'm thinking it would be broadly similar to Cargo's existing --message-format json
option. You'd run cargo criterion --message-format json
and it would emit JSON messages. Cargo sends the messages over stdout and prints all of its actual output on stderr. I could probably have cargo-criterion
do the same, but I'm also considering using a TCP socket on localhost
, still undecided there. Naturally, you would have to cargo install cargo-criterion
first.
The messages themselves are still TBD, but I do want to keep my options open for the future so I'm thinking that the only field that is guaranteed to be there would be some estimate of a typical value. All of the other stats (mean, variance, MAD, confidence bounds, etc.) would be officially documented as optional so that I still have the option to not report them. For example, I'm imagining an alternate sampling method for long-running benchmarks where we just run the benchmark say 100 times and record each individual time. Those measurements would produce a different set of statistics and I wouldn't want to break downstream code by only sometimes reporting, say, the slope measurements. I'm also thinking I might set it up to parse bencher
's statistics from its stdout output - then I could incorporate bencher
benchmarks into cargo-criterion
's reports and JSON messages, but of course bencher
reports far fewer statistics than Criterion.rs does, so there would have to be a lot of things missing from those JSON messages if I did that.
None of this is a promise, mind - I'm still designing a lot of this stuff and figuring out what makes sense. bencher
support is particularly dubious, but it's a possibility.
Would that work for what you folks need? Do you have any comments or concerns with this idea so far? Is there anything in particular you want to have in the message format?
@bheisler This is already solved with the bencher compat output criterion provides. It is the reason I asked you to add it originally.
@bheisler Support for cargo-criterion has been implemented/tested in #145.
If anyone else is blocked by this/https://github.com/benchmark-action/github-action-benchmark/pull/145, Bencher supports Criterion: https://github.com/bencherdev/bencher#supported-benchmark-harnesses
A lot of rust projects use Criterion which overrides
cargo bench
, that may output something different, would this project work with that?What does the output need to look like?
For e.g im using https://github.com/jasonwilliams/criterion-compare-action which is a wrapper around https://github.com/bheisler/criterion.rs
This is the output from criterion on https://github.com/jasonwilliams/boa:
https://bheisler.github.io/criterion.rs/book/user_guide/command_line_output.html