cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.87k stars 3.77k forks source link

roachprod-microbench: publish comparison results to InfluxDB #128886

Closed herkolategan closed 2 weeks ago

herkolategan commented 1 month ago

At the moment the primary way to inspect microbenchmark regressions from the microbenchmark weekly job [1] is to look at the Google Sheets produced by the job. However, this does not always contain enough information to make an informed decision about a regression.

To improve upon this an effort should be made to have a way of displaying microbenchmark history alongside the results. There are a few options:

  1. Something similar to roachperf [2]
  2. Export to an existing system like Prometheus / Grafana / DataDog
  3. Consider a 3rd party example like Golang's perf / microbenchmarks dashboard [3]

In regards to option 1, there are plans to deprecate roachperf eventually. Option 2 was considered at one point, but tools like Grafana and DataDog are not exactly meant for granular single point metrics that compare against a baseline, or have good ways of consolidating regressions and displaying the information in an easily consumable way.

This makes option 3 currently the most lucrative, and since much of the implementation details are publicly available, it's a good starting point.

Golang's perf dashboards use a combination of cloud storage and InfluxDB (timeseries database) to store its metrics. Cloud storage combined with a SQL database, that serves as a data index, is used to store the source of truth for the comparisons (the raw logs from the microbenchmarks). The comparisons (processed results comparing two revisions) are stored in InfluxDB to make it more readily available for display on the dashboards.

This issue mainly deals with the way in which we plan to store the metrics in InfluxDB for our own purposes. The format in which Golang currently inserts microbenchmark data points is fairly simple and something that we can reuse.

Field Desc
low lower bound of summary
center center of summary
high higher bound of summary
upload-time timestamp of the run
baseline-commit latest stable version of cockroach
experiment-commit revision of performance that is evaluated against the baseline
benchmarks-commit revision of the microbenchmark framework used (usually the same as experiment)
Tag Desc
name name of the microbenchmark
unit unit of performance measured (ex., sec/op)
pkg package the benchmark is from
repository cockroach
branch branch the benchmarks are from
goarch machine architecture
goos machine operating system
machine-type the cloud vendor's type name for the machine (ex., n2-standard-32)

InfluxDB supports both fields and tags for a datapoint. Fields are not indexed, whereas tags are. Tags are useful for searching and filtering results.

A datapoint (measurement) will typically contain the commit SHAs for both the experiment, and baseline. Where experiment is the version being evaluated, and baseline is considered the stable version to evaluate against. In addition to storing the comparison we also capture metadata regarding the conditions and environment the microbenchmark was run in.

In order to be compatible with Golang's dashboard it is necessary to consider the queries that will be run against InfluxDB [4]. We can still optionally add our own metadata or other info to the datapoints that will get written to InfluxDB.

[1] https://github.com/cockroachdb/cockroach/blob/master/build/teamcity/cockroach/nightlies/microbenchmark_weekly.sh [2] https://roachperf.crdb.dev/ [3] https://perf.golang.org/dashboard/ [4] https://github.com/golang/build/blob/master/perf/app/dashboard.go#L300

Jira issue: CRDB-41257

blathers-crl[bot] commented 1 month ago

cc @cockroachdb/test-eng