add an option to show ratio instead of percent delta (possibly by default)

dweiller commented 1 year ago

I think that percentages are not easier to comprehend than ratios, especially when the delta is quite big. An example:

Benchmark 1 (376 runs): zig-out/bench/ReleaseSafe/zimalloc/create-destroy-loop-native-ReleaseSafe:
  measurement      mean ± σ               min … max                   outliers         delta
  wall_time        26.565ms ± 3.741ms     25.269ms … 52.593ms          18 ( 5%)        0%
  peak_rss         16M ± 1K               16M … 16M                    90 (24%)        0%
  cpu_cycles       29460307 ± 352152      27881614 … 34087924          25 ( 7%)        0%
  instructions     68245274 ± 3           68245252 … 68245299           8 ( 2%)        0%
  cache_references 1905677 ± 11890        1885623 … 2050296             4 ( 1%)        0%
  cache_misses     35904 ± 994            34424 … 51464                 7 ( 2%)        0%
  branch_misses    18101 ± 75             18032 … 19280                16 ( 4%)        0%
Benchmark 2 (21 runs): zig-out/bench/ReleaseSafe/gpa/create-destroy-loop-native-ReleaseSafe:
  measurement      mean ± σ               min … max                   outliers         delta
  wall_time        499.521ms ± 16.373ms   486.703ms … 547.971ms         2 (10%)        💩+1780.3% ±  8.6%
  peak_rss         30M ± 2K               30M … 30M                     0 ( 0%)        💩+ 89.5% ±  0.0%
  cpu_cycles       1436695570 ± 36769236  1385230017 … 1548137633       4 (19%)        💩+4776.7% ± 12.4%
  instructions     443694437 ± 8150479    433293521 … 465060803         2 (10%)        💩+550.1% ±  1.2%
  cache_references 51072489 ± 227383      50490604 … 51378709           1 ( 5%)        💩+2580.0% ±  1.2%
  cache_misses     21754058 ± 27034       21713445 … 21806249           0 ( 0%)        💩+60490.3% ±  7.5%
  branch_misses    4283745 ± 190618       4059269 … 4782911             1 ( 5%)        💩+23565.4% ± 104.1%
Benchmark 3 (60 runs): zig-out/bench/ReleaseSafe/mesh/create-destroy-loop-native-ReleaseSafe:
  measurement      mean ± σ               min … max                   outliers         delta
  wall_time        168.705ms ± 17.803ms   149.247ms … 226.071ms         1 ( 2%)        💩+535.1% ±  7.6%
  peak_rss         17M ± 2K               17M … 17M                     0 ( 0%)        💩+  8.0% ±  0.0%
  cpu_cycles       483128706 ± 36179722   453889331 … 625311678         6 (10%)        💩+1539.9% ± 12.3%
  instructions     657956378 ± 7          657956360 … 657956401         2 ( 3%)        💩+864.1% ±  0.0%
  cache_references 5418462 ± 3474959      3555473 … 26208274            5 ( 8%)        💩+184.3% ± 18.3%
  cache_misses     563791 ± 85035         512236 … 1009351              9 (15%)        💩+1470.3% ± 23.8%
  branch_misses    1009697 ± 704651       823842 … 4319352             11 (18%)        💩+5478.0% ± 391.1%

Here is the same benchmark run with the worst one first:

Benchmark 1 (10 runs): zig-out/bench/ReleaseSafe/gpa/create-destroy-loop-native-ReleaseSafe:
  measurement      mean ± σ               min … max                   outliers         delta
  wall_time        501.108ms ± 12.548ms   487.794ms … 516.363ms         0 ( 0%)        0%
  peak_rss         30M ± 1K               30M … 30M                     2 (20%)        0%
  cpu_cycles       1483284321 ± 13871550  1451769864 … 1497658793       2 (20%)        0%
  instructions     440695501 ± 8102579    432198649 … 459702251         0 ( 0%)        0%
  cache_references 51048638 ± 241437      50737084 … 51402025           0 ( 0%)        0%
  cache_misses     21761058 ± 32075       21698123 … 21794997           0 ( 0%)        0%
  branch_misses    4199931 ± 180500       4013353 … 4614098             0 ( 0%)        0%
Benchmark 2 (31 runs): zig-out/bench/ReleaseSafe/mesh/create-destroy-loop-native-ReleaseSafe:
  measurement      mean ± σ               min … max                   outliers         delta
  wall_time        161.333ms ± 13.266ms   143.47ms … 192.347ms          0 ( 0%)        ⚡- 67.8% ±  1.9%
  peak_rss         17M ± 2K               17M … 17M                     0 ( 0%)        ⚡- 43.0% ±  0.0%
  cpu_cycles       461790403 ± 7477095    451871827 … 478106740         0 ( 0%)        ⚡- 68.9% ±  0.5%
  instructions     657956376 ± 5          657956369 … 657956387         0 ( 0%)        💩+ 49.3% ±  0.7%
  cache_references 3920322 ± 257185       3555405 … 4573300             0 ( 0%)        ⚡- 92.3% ±  0.4%
  cache_misses     518445 ± 4596          510041 … 528130               0 ( 0%)        ⚡- 97.6% ±  0.1%
  branch_misses    824224 ± 250           823674 … 824660               0 ( 0%)        ⚡- 80.4% ±  1.5%
Benchmark 3 (147 runs): zig-out/bench/ReleaseSafe/zimalloc/create-destroy-loop-native-ReleaseSafe:
  measurement      mean ± σ               min … max                   outliers         delta
  wall_time        34.088ms ± 12.053ms    24.99ms … 55.173ms            0 ( 0%)        ⚡- 93.2% ±  1.5%
  peak_rss         16M ± 2K               16M … 16M                    38 (26%)        ⚡- 47.2% ±  0.0%
  cpu_cycles       29127566 ± 811043      27831022 … 30678771           0 ( 0%)        ⚡- 98.0% ±  0.1%
  instructions     68245275 ± 2           68245272 … 68245280           3 ( 2%)        ⚡- 84.5% ±  0.3%
  cache_references 1916213 ± 34330        1887064 … 2306377             1 ( 1%)        ⚡- 96.2% ±  0.1%
  cache_misses     36561 ± 922            35153 … 39289                 1 ( 1%)        ⚡- 99.8% ±  0.0%
  branch_misses    18107 ± 57             18030 … 18373                 3 ( 2%)        ⚡- 99.6% ±  0.7%

I think something like this is much easier to grok:

Benchmark 1 (376 runs): zig-out/bench/ReleaseSafe/zimalloc/create-destroy-loop-native-ReleaseSafe:
  measurement      mean ± σ               min … max                   outliers         ratio
  wall_time        26.565ms ± 3.741ms     25.269ms … 52.593ms          18 ( 5%)        1x
  peak_rss         16M ± 1K               16M … 16M                    90 (24%)        1x
  cpu_cycles       29460307 ± 352152      27881614 … 34087924          25 ( 7%)        1x
  instructions     68245274 ± 3           68245252 … 68245299           8 ( 2%)        1x
  cache_references 1905677 ± 11890        1885623 … 2050296             4 ( 1%)        1x
  cache_misses     35904 ± 994            34424 … 51464                 7 ( 2%)        1x
  branch_misses    18101 ± 75             18032 … 19280                16 ( 4%)        1x
Benchmark 2 (21 runs): zig-out/bench/ReleaseSafe/gpa/create-destroy-loop-native-ReleaseSafe:
  measurement      mean ± σ               min … max                   outliers         ratio
  wall_time        499.521ms ± 16.373ms   486.703ms … 547.971ms         2 (10%)        💩18.803x ±  0.086
  peak_rss         30M ± 2K               30M … 30M                     0 ( 0%)        💩 1.895x ±  0.000
  cpu_cycles       1436695570 ± 36769236  1385230017 … 1548137633       4 (19%)        💩48.767% ± 0.124
  instructions     443694437 ± 8150479    433293521 … 465060803         2 (10%)        💩6.501x ±  0.012
  cache_references 51072489 ± 227383      50490604 … 51378709           1 ( 5%)        💩26.800x ±  0.012
  cache_misses     21754058 ± 27034       21713445 … 21806249           0 ( 0%)        💩61.4903x ±  0.075
  branch_misses    4283745 ± 190618       4059269 … 4782911             1 ( 5%)        💩24.5654x ± 1.041
Benchmark 3 (60 runs): zig-out/bench/ReleaseSafe/mesh/create-destroy-loop-native-ReleaseSafe:
  measurement      mean ± σ               min … max                   outliers         ratio
  wall_time        168.705ms ± 17.803ms   149.247ms … 226.071ms         1 ( 2%)        💩6.351x ±  0.076
  peak_rss         17M ± 2K               17M … 17M                     0 ( 0%)        💩 1.080x ±  0.000
  cpu_cycles       483128706 ± 36179722   453889331 … 625311678         6 (10%)        💩16.399x ± 0.123
  instructions     657956378 ± 7          657956360 … 657956401         2 ( 3%)        💩9.641x ±  0.000
  cache_references 5418462 ± 3474959      3555473 … 26208274            5 ( 8%)        💩2.843x ± 0.183
  cache_misses     563791 ± 85035         512236 … 1009351              9 (15%)        💩15.703x ± 0.238
  branch_misses    1009697 ± 704651       823842 … 4319352             11 (18%)        💩55.780x ± 3.911

(I didn't properly convert the numbers on the confidence intervals to a ratio, so they'll be a bit off)

The ratio will be even easier to read (relative to the delta) if you also truncate some of the less significant figures in which case the ratio will need fewer digits than the delta to represent the performance differences (assuming we don't want to use scientific notation for the delta).

squeek502 commented 1 year ago

I agree about the hyperfine-style times faster/times slower being easier to understand. My suggestion would be to change the header to:

times faster/slower

instead of

ratio

since the 'times faster/slower' part is necessary context for what something like ⚡2.0x means

dweiller commented 1 year ago

faster/slower only works for the wall-time measurement. I think I'd just go for showing the ratio measurement / reference, so it will be good/green if it's significantly < 1 and bad/red if it's significantly > 1 and gray when it's not significantly different from 1. My example wasn't great as everything was worse than the reference.

andrewrk / poop

add an option to show ratio instead of percent delta (possibly by default) #19