Given the large fluctuation of the benchmarks comparing only the last
two results results in a large number of false positive alerts.
Furthermore it has the "boiling frog" problem in that we don't detect
slow regressions over time if we constantly ignore 10%+ differences.
This changes the approach: Now it looks at the last 20 results in
compares the latest with the worst of the previous 19 results. This should reduce the change of false positive alerts and at the same time improve the chance of detecting slow regressions.
Given the large fluctuation of the benchmarks comparing only the last two results results in a large number of false positive alerts. Furthermore it has the "boiling frog" problem in that we don't detect slow regressions over time if we constantly ignore 10%+ differences.
This changes the approach: Now it looks at the last 20 results in compares the latest with the worst of the previous 19 results. This should reduce the change of false positive alerts and at the same time improve the chance of detecting slow regressions.