Closed aclements closed 1 week ago
Other useful Unit metadata would be scaling (see benchstat’s newScaler) and metric names (see benchstat’s metricOf).
This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group
and the following for command-line flags
-unit "unit key=value key=value ..."
I don't understand how this flag would be used. What tools do you envision supporting this flag?
Should this proposal address how benchmark code can control the properties for the units it uses?
I am wondering how a benchmark that doesn't use the default units would communicate its unit properties under this proposal. Take, for example, BenchmarkParallelTimerLatency
in the time package.
I am reminded of this conversation: https://go-review.googlesource.com/c/go/+/232298/8..12/src/time/sleep_test.go#b585
I don't understand how this flag would be used. What tools do you envision supporting this flag?
benchstat, namely. I don't plan on adding support for this to the current version of benchstat, but I have a complete rewrite almost done that I would like to add support to. We're also working on automated performance monitoring (#48803), which wouldn't use the command-line flag specifically, but we would like to have unit properties for.
Support for "exact" measurements in particular has been a sore spot in benchstat for benchmarks that record things like "binary size", for example. Since that's not going to change with repeated measurement, we only report one sample, but that means that benchstat will refuse to show you deltas because there isn't enough data for it to statistically significant under general distributional assumptions.
I am wondering how a benchmark that doesn't use the default units would communicate its unit properties under this proposal.
They can actually just fmt.Print a "Unit" line in a sync.Once
from a benchmark function.
I am reminded of this conversation: https://go-review.googlesource.com/c/go/+/232298/8..12/src/time/sleep_test.go#b585
Heh, indeed. Benchstat v2 will handle this much better, without the need for unit properties. It parses units, and will normalize "ns" to "sec", then apply reasonable scaling to the result.
The improvements to benchstat sound really nice.
They can actually just fmt.Print a "Unit" line in a
sync.Once
from a benchmark function.
That seems ... unsatisfying. It also doesn't help discoverability much. Can we do anything more ergonomic?
Based on the discussion above, this proposal seems like a likely accept. — rsc for the proposal review group
@ChrisHines , I agree it's not particularly ergonomic, though I think it's ultimately a follow-on issue to this proposal. We added format and tooling support for custom metrics long before the testing package had B.ReportMetric
(and there was no workaround there!).
I would add the unit properties specification to the Go benchmark format spec and could certainly mention there that testing.B
benchmarks can just print these Units lines. Maybe it's worth mentioning in the doc for ReportMetric
?
@aclements What is the target user base is for this feature?
If we want people writing benchmarks for their own code to know about and use this feature then I think the testing
package docs are the place to put the information. That's my reference point when writing benchmarks. If it is anywhere else I suspect most people will never learn about it.
If the target user base is narrower for some reason, perhaps just those contributing to the Go itself, then maybe it belongs in a different place.
... I think it's ultimately a follow-on issue to this proposal. We added format and tooling support for custom metrics long before the testing package had
B.ReportMetric
...
From that I think I understand that this proposal is limited to the Go benchmark data format and benchstat CLI flags. The proposal title "testing: benchmark unit properties" gave me the impression that the testing package itself was in scope. My apologies if I got too excited and jumped the gun.
@aclements What is the target user base is for this feature?
I am intending this for anyone who writes Go benchmarks, generally more sophisticated ones (certainly if you're not using custom units you don't have much need for this!). Curiously, we don't actually reference the Go benchmark format or benchstat anywhere in the testing package documentation. It's really quite non-prescriptive. I think you're right that documenting this in the testing package is the right thing to do, though I think we'll have to pull in a bit more of the bigger Go benchmarking picture at the same time (which isn't a bad thing).
The proposal title "testing: benchmark unit properties" gave me the impression that the testing package itself was in scope.
That's fair. Maybe I should have titled this "x/perf:" or something. But I'm not sure you need to curb your enthusiasm. These things are all intentionally weakly coupled: as you point out, it won't initially be super ergonomic to access this from the testing package, but you can, and if this proposal is accepted, the broader tooling will make use of this.
No change in consensus, so accepted. 🎉 This issue now tracks the work of implementing the proposal. — rsc for the proposal review group
Change https://golang.org/cl/357530 mentions this issue: 14313-benchmark-format: add unit metadata
Change https://golang.org/cl/357549 mentions this issue: benchfmt: add support for unit metadata
Change https://golang.org/cl/357590 mentions this issue: testing: document custom units
Change https://golang.org/cl/309969 mentions this issue: cmd/benchstat: new version of benchstat
This got lost in the mists of time, but has now been submitted to the benchmark format spec.
I propose extending the Go benchmark format with a way to specify properties of units. Especially now that the
testing
package supports reporting custom metrics, it would be useful for tools that process benchmark results to understand basic properties of custom units in benchmark results. For example, there are two unit properties I would like forbenchstat
: 1. whether higher or lower values are better, and 2. whether measurements come from some underlying distribution (e.g., performance measurements) or are exact (e.g., file sizes).I don't have strong opinions about the syntax. I would like the syntax for this to make sense both in the benchmark format itself, and on the command line of tools. To have something concrete, I propose the following for the benchmark format
and the following for command-line flags
Starting the line with
Unit
parallels how benchmark result lines start withBenchmark
. Usingkey=value
parallels both flag syntax and benchmark name configuration. An obvious alternate would bekey:value
, but this seems too similar without being the same as file-level benchmark configuration, which requires whitespace after the colon (it's unfortunate that benchmark name configuration uses=
instead of:
). Another option is-key=value
to exactly parallel flag syntax, but that gets awkward when nested on the command line and raises the question of whether we also support-key value
.I propose it is an error to specify conflicting properties about a unit.
I also propose that a unit property applies to all following benchmark results, and that it is unspecified whether it applies to earlier benchmark results. The tools I have built that would benefit from unit properties aggregate all results before presenting them, so unit properties naturally apply to all results. But I don't want to force purely streaming tools to read all results before emitting anything just in case there's a later unit property line. Alternatively, we could say it's an error if a property is supplied for any unit that's already been seen, but this seems overly restrictive for aggregating tools.
For specific properties, I propose
{higher,lower}={better,worse}
for indicating direction. Graphs and tables are often labeled "higher is better" or "lower is better", and this is meant to parallel that. I propose supporting all four combinations so people don't have to remember which works.assume={nothing,exact}
for specifying statistical assumptions.nothing
means to make no statistical assumptions (use non-parametric methods) andexact
means to assume measurements are exact (e.g., repeated measurement is not necessary to increase confidence). The default isnothing
. In the future we could also supportnormal
, but that's almost never the right assumption for benchmarks, so I'm omitting it for now.Some concrete examples:
Unit ns/op lower=better assume=nothing
,Unit MB/s higher=better assume=nothing
, etc. These would be built-in to the tooling.Unit text-bytes lower=better assume=exact
or on the command-line as-unit "text-bytes lower=better assume=exact"
./cc @rsc @mknyszek @dr2chase @jeremyfaller