Scale built in Measure units

epompeii commented 7 months ago

Currently, the built in units use the most exact possible units. For example Latency uses nanoseconds. This can lead to some rather ridiculously large Metrics (ex 13,000,000,000,000 nanoseconds). In the Perf Plots: 1) Create a way to scale these units 2) Auto-select the best scale based on the current dataset

epompeii commented 4 months ago

This will be pretty trivial to implement for the built in Measure units. However, I don't want the built in Measures to be magical or treated any differently than custom Measures.

With that said, I think that Measures should be able to supply a scale_units function that implements a set (TBD) interface.

For security and approachability reasons, I think that AssemblyScript would be a good choice here, with the scale_units functions run in WASM.

epompeii commented 2 months ago

This should also work in the benchmark results table, so the solution here should be persisted across all views. That is, this should not just be for the Perf Plot UI.

thomaseizinger commented 1 month ago

I want to suggest to not couple this to the metric itself. In fact, even the unit itself seems to be unnecessarily coupled to the metric?

We have some benchmarks that measure throughout. For those particular ones, the metric is bits / s in the MBit range. But for other benchmarks, the same metric could mean something else in a different range, right?

So perhaps this needs to be separate from the metrics and instead associated with a benchmark to make the plots look nice.

epompeii commented 1 month ago

I think what you are advocating for is the way things work now.

Benchmarks are named performance regression tests. (ex my_benchmark)
- Metrics are a single, point-in-time performance regression test result. (ex 42.0)
- Measures are the unit of measurement for a Metric. (ex Foo as foos / second)

For those particular ones, the metric is bits / s in the MBit range

So the Measure here would be Throughput as bits / s.

When this issue is completed, Bencher would have a way to scale bits / s to MB / s for your results (Metrics) in the MBit range.

Does that make sense? Let me know if I'm missing anything for your use case though.

thomaseizinger commented 1 month ago

For those particular ones, the metric is bits / s in the MBit range

So the Measure here would be Throughput as bits / s.

When this issue is completed, Bencher would have a way to scale bits / s to MB / s for your results (Metrics) in the MBit range.

Does that make sense? Let me know if I'm missing anything for your use case though.

Currently, a unit is associated with a measure, like bits / s with throughput.

What if I have multiple benchmarks in my project that all have a form of throughput but one of them is in bits / s and the other in syscalls / s?

Would you expect to define more specific throughputs then? It feels odd that a measure is something you define as a separate entity to be reused across benchmarks, even though it might not apply to all benchmarks in a project. Adding scaling to this complicates this further: What if two benchmarks in the same project use bits / s but one is in the GB/s range and the other in KB/s. Do I need to make two different measures for this too?

epompeii commented 1 month ago

Currently, a unit is associated with a measure, like bits / s with throughput.

Correct!

What if I have multiple benchmarks in my project that all have a form of throughput but one of them is in bits / s and the other in syscalls / s?

Would you expect to define more specific throughputs then?

I would recommend creating two Measures:

Bandwidth as bits / second
Syscalls as syscalls / seconds

It feels odd that a measure is something you define as a separate entity to be reused across benchmarks, even though it might not apply to all benchmarks in a project.

Can you help me understand why this feels odd?

Bencher needs to be able to support benchmarks that have 1 to many different measurements. We can't assume that all of these measurements will always be used by all benchmarks though.

For example:

Benchmark A you collect wall clock time
Benchmark B you collect wall clock time and four different instruction counts
Benchmark C you collect four different instruction counts and heap allocations

Adding scaling to this complicates this further: What if two benchmarks in the same project use bits / s but one is in the GB/s range and the other in KB/s. Do I need to make two different measures for this too?

The current thought is that you would use one Measure for both of these cases using the lowest, indivisible units as the "base". For your example this would be bits / s. When this issue is completed, the UI will scale the units as necessary. The Measure would provide an interface for this scaling.

So when viewing the benchmarks that are in the KB / s range, then the results will be plotted in KB / s and the benchmarks in the bits / s range would be plotted in bits / s.

thomaseizinger commented 1 month ago

What if I have multiple benchmarks in my project that all have a form of throughput but one of them is in bits / s and the other in syscalls / s? Would you expect to define more specific throughputs then?

I would recommend creating two Measures:
1. `Bandwidth` as ` bits / second`

2. `Syscalls` as `syscalls / seconds`

Okay, this makes more sense! I think I got confused because the throughput metrics was pre-created and said something generic like "operations / s" so I thought this was meant to be used for all kinds of throughputs. Maybe that could be made more specific so it is clear that creating new measures is something one is expected to do?

epompeii commented 1 month ago

I think I got confused because the throughput metrics was pre-created and said something generic like "operations / s" so I thought this was meant to be used for all kinds of throughputs.

Ah, yeah. The built-in Throughput is there because some of the general purpose benchmarking harnesses report throughput. The units they all use for the numerator is likewise rather generic. I just went with operations to standardize across them.

Maybe that could be made more specific so it is clear that creating new measures is something one is expected to do?

Definitely! The only nuance is that it is only expected when creating a custom benchmarking harness. I have some verbage to this effect in the how to track custom benchmarks docs. However, I specifically chose to use the built-in Latency Measure for the example to keep things simple. It may be worth adding a second, custom Measure example there as well.

thomaseizinger commented 1 month ago

~Maybe "Throughput" could be named "dummy-throughput" instead? Or, "mock-throughput"?~

nvm, just read the thing about custom harnesses. Wouldn't it always make sense for users to create a custom measure? Perhaps the measure for built-in adapters need to be configurable via an env var so users can set it?

epompeii commented 1 month ago

Wouldn't it always make sense for users to create a custom measure?

It would not always make sense.

If one is using a built-in adapter, they will use its built-in Measure(s).
If one is using a custom adapter, they may want to:
- Use built-in Measure(s) (ex a custom harness that still measures Latency as nanoseconds)
- Use custom Measure(s)

Perhaps the measure for built-in adapters need to be configurable via an env var so users can set it?

This is up to the benchmarking harness itself to support. I have yet to see this in the wild though.

bencherdev / bencher

Scale built in Measure units #311