enso-org / enso

Hybrid visual and textual functional programming.
https://enso.org
Apache License 2.0
7.34k stars 323 forks source link

Improve Engine benchmark reporting #5714

Open Akirathan opened 1 year ago

Akirathan commented 1 year ago

Currently, we report benchmarks in a very naive fashion - for each benchmark name (label), we report only score, which is a measurement of how many iterations were done in a millisecond. The benchmark reporting sources are located in org.enso.interpreter.bench package. All the benchmark jobs defined in Benchmark Actions upload a single bench-report.xml artifact that can be manually compared to other such artifacts from different jobs. This is very inconvenient.

Required fields per benchmark

Every benchmark report item should have at least these properties:

We should also think about a better file format than XML. It is, for example, easier to manipulate json files.

Sufficient warmup

We should also think about how to ensure that there are no more ongoing Graal compilations when in measurement phases, i.e., we should ensure that the benchmark is stable. Ongoing compilations in measurement might signal that the warmup was insufficient, and it will also mess with the score. Note that during the warmup iterations, compilations are expected.

An idea how to automatically track sufficient warmup is in https://github.com/enso-org/enso/issues/6271#issuecomment-1507924937

### Tasks
- [ ] Improve the bench reports to contain at least the fields mentioned above.
- [ ] Make sure that the warmup is sufficient for all the benchmarks.
- [X] Explore some third-party technologies that can automatically visualize benchmark results
- [X] Enso's `Bench.measure` method can optionally have the same output as JMH. We already have substantial amount of these benchmarks in `tests/Benchmark/` module.

single endpoint can be, e.g., a connection to a MongoDB. The Enso script for processing the result can connect to this MongoDB and download all the relevant benchmark reports.

Related issues

Edit 2023-08-22

The current state is that:

I am still not closing this issue, since there is some valuable information about what we should include in the benchmark result output. Currently, the benchmark results are still XML files with a single double score values.

Akirathan commented 1 year ago

Assigned p-medium priority, because we basically waste roughly 3 hours of CPU time on every benchmark not automating the reporting now. I assume there is nobody manually checking the result of each benchmark.

wdanilo commented 1 year ago

@Akirathan would faster machine help us here? If so, let's buy faster machine. Talk with @mwu-tow about what kind of benefit it can bring pls. Also, can we run benchmarks in parallel - some benchmarks on one machine, other on another one?

Akirathan commented 1 year ago

@Akirathan would faster machine help us here? If so, let's buy faster machine. Talk with @mwu-tow about what kind of benefit it can bring pls. Also, can we run benchmarks in parallel - some benchmarks on one machine, other on another one?

Let's continue the discussion at #5718, I believe you can find some answers there.