andrewrk / poop

Performance Optimizer Observation Platform
MIT License
788 stars 50 forks source link

add an option to show ratio instead of percent delta (possibly by default) #19

Open dweiller opened 1 year ago

dweiller commented 1 year ago

I think that percentages are not easier to comprehend than ratios, especially when the delta is quite big. An example:

Benchmark 1 (376 runs): zig-out/bench/ReleaseSafe/zimalloc/create-destroy-loop-native-ReleaseSafe:
  measurement      mean ± σ               minmax                   outliers         delta
  wall_time        26.565ms ± 3.741ms     25.269ms52.593ms          18 ( 5%)        0%
  peak_rss         16M ± 1K               16M16M                    90 (24%)        0%
  cpu_cycles       29460307 ± 352152      2788161434087924          25 ( 7%)        0%
  instructions     68245274 ± 3           6824525268245299           8 ( 2%)        0%
  cache_references 1905677 ± 11890        18856232050296             4 ( 1%)        0%
  cache_misses     35904 ± 994            3442451464                 7 ( 2%)        0%
  branch_misses    18101 ± 75             1803219280                16 ( 4%)        0%
Benchmark 2 (21 runs): zig-out/bench/ReleaseSafe/gpa/create-destroy-loop-native-ReleaseSafe:
  measurement      mean ± σ               minmax                   outliers         delta
  wall_time        499.521ms ± 16.373ms   486.703ms547.971ms         2 (10%)        💩+1780.3% ±  8.6%
  peak_rss         30M ± 2K               30M30M                     0 ( 0%)        💩+ 89.5% ±  0.0%
  cpu_cycles       1436695570 ± 36769236  13852300171548137633       4 (19%)        💩+4776.7% ± 12.4%
  instructions     443694437 ± 8150479    433293521465060803         2 (10%)        💩+550.1% ±  1.2%
  cache_references 51072489 ± 227383      5049060451378709           1 ( 5%)        💩+2580.0% ±  1.2%
  cache_misses     21754058 ± 27034       2171344521806249           0 ( 0%)        💩+60490.3% ±  7.5%
  branch_misses    4283745 ± 190618       40592694782911             1 ( 5%)        💩+23565.4% ± 104.1%
Benchmark 3 (60 runs): zig-out/bench/ReleaseSafe/mesh/create-destroy-loop-native-ReleaseSafe:
  measurement      mean ± σ               minmax                   outliers         delta
  wall_time        168.705ms ± 17.803ms   149.247ms226.071ms         1 ( 2%)        💩+535.1% ±  7.6%
  peak_rss         17M ± 2K               17M17M                     0 ( 0%)        💩+  8.0% ±  0.0%
  cpu_cycles       483128706 ± 36179722   453889331625311678         6 (10%)        💩+1539.9% ± 12.3%
  instructions     657956378 ± 7          657956360657956401         2 ( 3%)        💩+864.1% ±  0.0%
  cache_references 5418462 ± 3474959      355547326208274            5 ( 8%)        💩+184.3% ± 18.3%
  cache_misses     563791 ± 85035         5122361009351              9 (15%)        💩+1470.3% ± 23.8%
  branch_misses    1009697 ± 704651       8238424319352             11 (18%)        💩+5478.0% ± 391.1%

Here is the same benchmark run with the worst one first:

Benchmark 1 (10 runs): zig-out/bench/ReleaseSafe/gpa/create-destroy-loop-native-ReleaseSafe:
  measurement      mean ± σ               minmax                   outliers         delta
  wall_time        501.108ms ± 12.548ms   487.794ms516.363ms         0 ( 0%)        0%
  peak_rss         30M ± 1K               30M30M                     2 (20%)        0%
  cpu_cycles       1483284321 ± 13871550  14517698641497658793       2 (20%)        0%
  instructions     440695501 ± 8102579    432198649459702251         0 ( 0%)        0%
  cache_references 51048638 ± 241437      5073708451402025           0 ( 0%)        0%
  cache_misses     21761058 ± 32075       2169812321794997           0 ( 0%)        0%
  branch_misses    4199931 ± 180500       40133534614098             0 ( 0%)        0%
Benchmark 2 (31 runs): zig-out/bench/ReleaseSafe/mesh/create-destroy-loop-native-ReleaseSafe:
  measurement      mean ± σ               minmax                   outliers         delta
  wall_time        161.333ms ± 13.266ms   143.47ms192.347ms          0 ( 0%)        - 67.8% ±  1.9%
  peak_rss         17M ± 2K               17M17M                     0 ( 0%)        - 43.0% ±  0.0%
  cpu_cycles       461790403 ± 7477095    451871827478106740         0 ( 0%)        - 68.9% ±  0.5%
  instructions     657956376 ± 5          657956369657956387         0 ( 0%)        💩+ 49.3% ±  0.7%
  cache_references 3920322 ± 257185       35554054573300             0 ( 0%)        - 92.3% ±  0.4%
  cache_misses     518445 ± 4596          510041528130               0 ( 0%)        - 97.6% ±  0.1%
  branch_misses    824224 ± 250           823674824660               0 ( 0%)        - 80.4% ±  1.5%
Benchmark 3 (147 runs): zig-out/bench/ReleaseSafe/zimalloc/create-destroy-loop-native-ReleaseSafe:
  measurement      mean ± σ               minmax                   outliers         delta
  wall_time        34.088ms ± 12.053ms    24.99ms55.173ms            0 ( 0%)        - 93.2% ±  1.5%
  peak_rss         16M ± 2K               16M16M                    38 (26%)        - 47.2% ±  0.0%
  cpu_cycles       29127566 ± 811043      2783102230678771           0 ( 0%)        - 98.0% ±  0.1%
  instructions     68245275 ± 2           6824527268245280           3 ( 2%)        - 84.5% ±  0.3%
  cache_references 1916213 ± 34330        18870642306377             1 ( 1%)        - 96.2% ±  0.1%
  cache_misses     36561 ± 922            3515339289                 1 ( 1%)        - 99.8% ±  0.0%
  branch_misses    18107 ± 57             1803018373                 3 ( 2%)        - 99.6% ±  0.7%

I think something like this is much easier to grok:

Benchmark 1 (376 runs): zig-out/bench/ReleaseSafe/zimalloc/create-destroy-loop-native-ReleaseSafe:
  measurement      mean ± σ               minmax                   outliers         ratio
  wall_time        26.565ms ± 3.741ms     25.269ms52.593ms          18 ( 5%)        1x
  peak_rss         16M ± 1K               16M16M                    90 (24%)        1x
  cpu_cycles       29460307 ± 352152      2788161434087924          25 ( 7%)        1x
  instructions     68245274 ± 3           6824525268245299           8 ( 2%)        1x
  cache_references 1905677 ± 11890        18856232050296             4 ( 1%)        1x
  cache_misses     35904 ± 994            3442451464                 7 ( 2%)        1x
  branch_misses    18101 ± 75             1803219280                16 ( 4%)        1x
Benchmark 2 (21 runs): zig-out/bench/ReleaseSafe/gpa/create-destroy-loop-native-ReleaseSafe:
  measurement      mean ± σ               minmax                   outliers         ratio
  wall_time        499.521ms ± 16.373ms   486.703ms547.971ms         2 (10%)        💩18.803x ±  0.086
  peak_rss         30M ± 2K               30M30M                     0 ( 0%)        💩 1.895x ±  0.000
  cpu_cycles       1436695570 ± 36769236  13852300171548137633       4 (19%)        💩48.767% ± 0.124
  instructions     443694437 ± 8150479    433293521465060803         2 (10%)        💩6.501x ±  0.012
  cache_references 51072489 ± 227383      5049060451378709           1 ( 5%)        💩26.800x ±  0.012
  cache_misses     21754058 ± 27034       2171344521806249           0 ( 0%)        💩61.4903x ±  0.075
  branch_misses    4283745 ± 190618       40592694782911             1 ( 5%)        💩24.5654x ± 1.041
Benchmark 3 (60 runs): zig-out/bench/ReleaseSafe/mesh/create-destroy-loop-native-ReleaseSafe:
  measurement      mean ± σ               minmax                   outliers         ratio
  wall_time        168.705ms ± 17.803ms   149.247ms226.071ms         1 ( 2%)        💩6.351x ±  0.076
  peak_rss         17M ± 2K               17M17M                     0 ( 0%)        💩 1.080x ±  0.000
  cpu_cycles       483128706 ± 36179722   453889331625311678         6 (10%)        💩16.399x ± 0.123
  instructions     657956378 ± 7          657956360657956401         2 ( 3%)        💩9.641x ±  0.000
  cache_references 5418462 ± 3474959      355547326208274            5 ( 8%)        💩2.843x ± 0.183
  cache_misses     563791 ± 85035         5122361009351              9 (15%)        💩15.703x ± 0.238
  branch_misses    1009697 ± 704651       8238424319352             11 (18%)        💩55.780x ± 3.911

(I didn't properly convert the numbers on the confidence intervals to a ratio, so they'll be a bit off)

The ratio will be even easier to read (relative to the delta) if you also truncate some of the less significant figures in which case the ratio will need fewer digits than the delta to represent the performance differences (assuming we don't want to use scientific notation for the delta).

squeek502 commented 1 year ago

I agree about the hyperfine-style times faster/times slower being easier to understand. My suggestion would be to change the header to:

times faster/slower

instead of

ratio

since the 'times faster/slower' part is necessary context for what something like ⚡2.0x means

dweiller commented 1 year ago

faster/slower only works for the wall-time measurement. I think I'd just go for showing the ratio measurement / reference, so it will be good/green if it's significantly < 1 and bad/red if it's significantly > 1 and gray when it's not significantly different from 1. My example wasn't great as everything was worse than the reference.