conbench / conbench

Language-independent Continuous Benchmarking (CB) Framework
https://conbench.github.io/conbench/
MIT License
104 stars 21 forks source link

Build API for 'regression candidate detection methods' #634

Open jgehrcke opened 1 year ago

jgehrcke commented 1 year ago

Based on https://github.com/conbench/conbench/issues/530.

plan for multiple 'regression candidate detection methods', because there will be no silver bullet. Each method has a name / identifier. An API client can then ask Conbench about an opinion about regression, and each method reports independently (a binary decision, or maybe a numerical continuous outcome).

One simple method we want to introduce here: comparing two multisample data points (baseline, and contender), using the uncertainty derived from multisampling, (plus potentially additional static tolerance). (new, basic method)

This should also report the result of the existing method that looks at more than one datapoint (https://github.com/conbench/conbench/issues/583).

austin3dickey commented 1 year ago

We currently return a few (very underdocumented!) different comparisons from the /compare APIs:

image

For example, the contender_z_score metric is a numerical continuous outcome, which is then thresholded by the input API parameter threshold_z to produce the binary contender_z_regression metric. That system uses the whole past distribution to inform the metrics.

Separately, the change metric only looks at the percent difference between baseline and contender, and is thresholded by the threshold parameter to create the regression metric.

(This comment is not declaring that this is the best API, but it's just as an inform of prior art. 🙂)