Open Akirathan opened 1 year ago
Assigned p-medium
priority, because we basically waste roughly 3 hours of CPU time on every benchmark not automating the reporting now. I assume there is nobody manually checking the result of each benchmark.
@Akirathan would faster machine help us here? If so, let's buy faster machine. Talk with @mwu-tow about what kind of benefit it can bring pls. Also, can we run benchmarks in parallel - some benchmarks on one machine, other on another one?
@Akirathan would faster machine help us here? If so, let's buy faster machine. Talk with @mwu-tow about what kind of benefit it can bring pls. Also, can we run benchmarks in parallel - some benchmarks on one machine, other on another one?
Let's continue the discussion at #5718, I believe you can find some answers there.
Currently, we report benchmarks in a very naive fashion - for each benchmark name (label), we report only score, which is a measurement of how many iterations were done in a millisecond. The benchmark reporting sources are located in
org.enso.interpreter.bench
package. All the benchmark jobs defined in Benchmark Actions upload a singlebench-report.xml
artifact that can be manually compared to other such artifacts from different jobs. This is very inconvenient.Required fields per benchmark
Every benchmark report item should have at least these properties:
Bench.measure
CHANGELOG
should be provided where we should write all the version changes for all the benchmarks.We should also think about a better file format than XML. It is, for example, easier to manipulate json files.
Sufficient warmup
We should also think about how to ensure that there are no more ongoing Graal compilations when in measurement phases, i.e., we should ensure that the benchmark is stable. Ongoing compilations in measurement might signal that the warmup was insufficient, and it will also mess with the score. Note that during the warmup iterations, compilations are expected.
An idea how to automatically track sufficient warmup is in https://github.com/enso-org/enso/issues/6271#issuecomment-1507924937
single endpoint can be, e.g., a connection to a MongoDB. The Enso script for processing the result can connect to this MongoDB and download all the relevant benchmark reports.
Related issues
Edit 2023-08-22
The current state is that:
I am still not closing this issue, since there is some valuable information about what we should include in the benchmark result output. Currently, the benchmark results are still XML files with a single double score values.