Improve Engine benchmark reporting

Akirathan commented 1 year ago

Currently, we report benchmarks in a very naive fashion - for each benchmark name (label), we report only score, which is a measurement of how many iterations were done in a millisecond. The benchmark reporting sources are located in org.enso.interpreter.bench package. All the benchmark jobs defined in Benchmark Actions upload a single bench-report.xml artifact that can be manually compared to other such artifacts from different jobs. This is very inconvenient.

Required fields per benchmark

Every benchmark report item should have at least these properties:

Name
Source
- JMH vs Bench.measure
Version
- So that we know that if we see a different number for a benchmark with same name but different version, we should not be surprised and look for changes in the benchmark
- A single CHANGELOG should be provided where we should write all the version changes for all the benchmarks.
Time and Date of the benchmark run
Conclusion of the bench run - success/failure/...
- Avoid problems with erroneous benchmarks not reported anywhere - https://github.com/enso-org/enso/issues/9394
Number of warmup iterations
Number of measurement iterations
Peak performance score (ops per ms)
Commit details
- Commit ID, author, message
[Warmup performance score (ops per ms)]
- Optional
- So that we know there is not a huge regression in warmup.

We should also think about a better file format than XML. It is, for example, easier to manipulate json files.

Sufficient warmup

We should also think about how to ensure that there are no more ongoing Graal compilations when in measurement phases, i.e., we should ensure that the benchmark is stable. Ongoing compilations in measurement might signal that the warmup was insufficient, and it will also mess with the score. Note that during the warmup iterations, compilations are expected.

An idea how to automatically track sufficient warmup is in https://github.com/enso-org/enso/issues/6271#issuecomment-1507924937

### Tasks
- [ ] Improve the bench reports to contain at least the fields mentioned above.
- [ ] Make sure that the warmup is sufficient for all the benchmarks.
- [X] Explore some third-party technologies that can automatically visualize benchmark results
- [X] Enso's `Bench.measure` method can optionally have the same output as JMH. We already have substantial amount of these benchmarks in `tests/Benchmark/` module.

single endpoint can be, e.g., a connection to a MongoDB. The Enso script for processing the result can connect to this MongoDB and download all the relevant benchmark reports.

Related issues

Blocker for #5165

Edit 2023-08-22

The current state is that:

We collect the Engine benchmarks at https://enso-org.github.io/engine-benchmark-results/
No need to provide a single endpoint for benchmark results. Let's keep it as simple as possible.
Working on https://github.com/enso-org/enso/pull/7597

I am still not closing this issue, since there is some valuable information about what we should include in the benchmark result output. Currently, the benchmark results are still XML files with a single double score values.

Akirathan commented 1 year ago

Assigned p-medium priority, because we basically waste roughly 3 hours of CPU time on every benchmark not automating the reporting now. I assume there is nobody manually checking the result of each benchmark.

wdanilo commented 1 year ago

@Akirathan would faster machine help us here? If so, let's buy faster machine. Talk with @mwu-tow about what kind of benefit it can bring pls. Also, can we run benchmarks in parallel - some benchmarks on one machine, other on another one?

Akirathan commented 1 year ago

@Akirathan would faster machine help us here? If so, let's buy faster machine. Talk with @mwu-tow about what kind of benefit it can bring pls. Also, can we run benchmarks in parallel - some benchmarks on one machine, other on another one?

Let's continue the discussion at #5718, I believe you can find some answers there.

enso-org / enso