bheisler / criterion.rs

Statistics-driven benchmarking library for Rust
Apache License 2.0
4.31k stars 292 forks source link

"bencher" output format doesn't quite match what libtest creates #704

Open d-e-s-o opened 1 year ago

d-e-s-o commented 1 year ago

Thanks for adding the bencher output format. I have both libtest based benchmarks as well as criterion based ones, and I use this output format to have better comparability.

However, I noticed that the formats don't quite match up (which I will say kind of runs against the stated purpose: "it may be useful to support external tools that can parse this output"). Specifically, this is output generated by libtest benchmarks:

running 4 tests
test dwarf::units::tests::bench_function_parsing           ... bench:  13,660,119 ns/iter (+/- 70,841)
test dwarf::units::tests::bench_function_parsing_addr2line ... bench:  11,567,597 ns/iter (+/- 78,162)
test dwarf::units::tests::bench_line_parsing               ... bench:   4,888,411 ns/iter (+/- 957,070)
test dwarf::units::tests::bench_line_parsing_addr2line     ... bench:   5,278,859 ns/iter (+/- 336,390)

Compare that to criterion based ones:

test main/inspect :: lookup_dwarf ... bench:   208037937 ns/iter (+/- 3266769)
test main/normalize :: normalize_process ... bench:      182202 ns/iter (+/- 4353)
test main/symbolize :: symbolize_process ... bench:     4074900 ns/iter (+/- 122307)
test main/symbolize :: symbolize_dwarf ... bench:   140352319 ns/iter (+/- 3505828)
test main/symbolize :: symbolize_gsym ... bench:       42150 ns/iter (+/- 1559)

(some of those criterion benchmarks are in different groups; I haven't checked whether that has any impact)

There are two main differences that I can make out:

Would it make sense and be possible to align the two more closely?

d-e-s-o commented 1 year ago

Thousands separators don't seem too hard to add. See #705. As for the padding/alignment, here is the relevant logic in libtest, for reference:

https://github.com/rust-lang/rust/blob/557359f92512ca88b62a602ebda291f17a953002/library/test/src/console.rs#L299-L303

https://github.com/rust-lang/rust/blob/557359f92512ca88b62a602ebda291f17a953002/library/test/src/formatters/pretty.rs#L171-L181

I believe given the current state of the reporter logic that this requires a bit of a restructuring of internals to be feasible. Right now the reporter does not know of test names ahead of time from what I gather. If you have any suggestions, please let me know.