stable benchmarking - Githubissues

I totally think this is in scope! I think we'll need to change a few things. Off the top of my head:

the current time series rely on a lexicographic sort of nightly dates producing the correct order, but we need some way to either segment toolchains into categories, or to provide some other ordering mechanism for toolchains when we're running the analysis (note that this will be important for when we can run benchmarks for each merge's toolchain as well)
my previous attempts at a broad score summarizing performance were a bit too hacky, but we'll need to figure out some metric to display for stable toolchains on the homepage, because the current anomaly detection relies on having a bunch of prior data points to compare against
probably some other things will break that I'm not thinking of

anp / lolbench