catapult-project / catapult

Deprecated Catapult GitHub. Please instead use http://crbug.com "Speed>Benchmarks" component for bugs and https://chromium.googlesource.com/catapult for downloading and editing source code..
https://chromium.googlesource.com/catapult
BSD 3-Clause "New" or "Revised" License
1.91k stars 562 forks source link

Include p-value (t-statistic?) on results page. #4609

Closed bfgeek closed 5 years ago

bfgeek commented 5 years ago

One thing I've noticed folks doing is "p-hacking" results sometimes, or relying on an intuition of which tests matter in a particular benchmark suite, or performing multiple runs of a benchmark to get a better signal out of a benchmark.

It'd might be helpful if on the results page there was an additional column with a p-value (based on t-stat?) 🤷‍♂️

benshayden commented 5 years ago

When a reference column is selected, open cells display a p-value indicating (warning, approximate statistical language ahead) the probability that the cell is indistinguishable from the reference cell (same row, reference column). I'm not sure if that's the p-value you might be looking for?

I think that running the same benchmark multiple times, or passing --pageset-repeat to run_benchmark, is a valid way to strengthen the signal, right?

The best way to combat p-hacking that I've found so far is to educate users on how to avoid it. Every results.html links to the doc, where I've tried to explain statistical significance as best I can without taking a year off to write a textbook, but patches are always welcome, either for results.html or the doc.

I'm going to close this since we're moving to monorail. This bug and the tbm2 component might be relevant. Please feel free to comment there or file a new crbug.