Open 2010YOUY01 opened 6 days ago
Now benchmark scripts only supports comparing two benchmark run results, see https://github.com/apache/datafusion/tree/main/benchmarks
Comparing main and mybranch -------------------- Benchmark tpch.json -------------------- ┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Query ┃ main ┃ mybranch ┃ Change ┃ ┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ QQuery 1 │ 2520.52ms │ 2795.09ms │ 1.11x slower │ │ QQuery 2 │ 222.37ms │ 216.01ms │ no change │ │ QQuery 3 │ 248.41ms │ 239.07ms │ no change │ │ QQuery 4 │ 144.01ms │ 129.28ms │ +1.11x faster │ │ QQuery 5 │ 339.54ms │ 327.53ms │ no change │ │ QQuery 6 │ 147.59ms │ 138.73ms │ +1.06x faster │ │ QQuery 7 │ 605.72ms │ 631.23ms │ no change │ │ QQuery 8 │ 326.35ms │ 372.12ms │ 1.14x slower │ │ QQuery 9 │ 579.02ms │ 634.73ms │ 1.10x slower │ │ QQuery 10 │ 403.38ms │ 420.39ms │ no change │ │ QQuery 11 │ 201.94ms │ 212.12ms │ 1.05x slower │ │ QQuery 12 │ 235.94ms │ 254.58ms │ 1.08x slower │ │ QQuery 13 │ 738.40ms │ 789.67ms │ 1.07x slower │ │ QQuery 14 │ 198.73ms │ 206.96ms │ no change │ │ QQuery 15 │ 183.32ms │ 179.53ms │ no change │ │ QQuery 16 │ 168.57ms │ 186.43ms │ 1.11x slower │ │ QQuery 17 │ 2032.57ms │ 2108.12ms │ no change │ │ QQuery 18 │ 1912.80ms │ 2134.82ms │ 1.12x slower │ │ QQuery 19 │ 391.64ms │ 368.53ms │ +1.06x faster │ │ QQuery 20 │ 648.22ms │ 691.41ms │ 1.07x slower │ │ QQuery 21 │ 866.25ms │ 1020.37ms │ 1.18x slower │ │ QQuery 22 │ 115.94ms │ 117.27ms │ no change │ └──────────────┴──────────────┴──────────────┴───────────────┘
It would be useful to display the difference among multiple benchmark runs
./bench.sh compare main branch1 branch2
In this case the first run can be the baseline, and all the following runs should be compared against the baseline. The result will look like
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Query ┃ main ┃ branch1 ┃ Change 1 ┃ branch2 ┃ Change 2 ┃ ┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ QQuery 1 │ 2520.52ms │ 2795.09ms │ 1.11x slower│ 2430.15ms │ +1.04x faster│ │ QQuery 2 │ 222.37ms │ 216.01ms │ no change│ 230.75ms │ 1.04x slower│ │ QQuery 3 │ 248.41ms │ 239.07ms │ no change│ 242.33ms │ no change│ │ QQuery 4 │ 144.01ms │ 129.28ms │ +1.11x faster│ 148.45ms │ 1.03x slower│ │ QQuery 5 │ 339.54ms │ 327.53ms │ no change│ 319.07ms │ +1.06x faster│ │ QQuery 6 │ 147.59ms │ 138.73ms │ +1.06x faster│ 142.21ms │ +1.04x faster│ │ QQuery 7 │ 605.72ms │ 631.23ms │ no change│ 598.91ms │ +1.01x faster│ │ QQuery 8 │ 326.35ms │ 372.12ms │ 1.14x slower│ 390.47ms │ 1.20x slower│ │ QQuery 9 │ 579.02ms │ 634.73ms │ 1.10x slower│ 602.58ms │ 1.04x slower│ │ QQuery 10 │ 403.38ms │ 420.39ms │ no change│ 417.24ms │ no change│ │ QQuery 11 │ 201.94ms │ 212.12ms │ 1.05x slower│ 199.07ms │ +1.01x faster│ │ QQuery 12 │ 235.94ms │ 254.58ms │ 1.08x slower│ 248.45ms │ 1.05x slower│ │ QQuery 13 │ 738.40ms │ 789.67ms │ 1.07x slower│ 765.93ms │ 1.04x slower│ │ QQuery 14 │ 198.73ms │ 206.96ms │ no change│ 195.21ms │ +1.02x faster│ │ QQuery 15 │ 183.32ms │ 179.53ms │ no change│ 182.04ms │ no change│ │ QQuery 16 │ 168.57ms │ 186.43ms │ 1.11x slower│ 172.59ms │ 1.02x slower│ │ QQuery 17 │ 2032.57ms │ 2108.12ms │ no change│ 1982.37ms │ +1.03x faster│ │ QQuery 18 │ 1912.80ms │ 2134.82ms │ 1.12x slower│ 2057.35ms │ 1.08x slower│ │ QQuery 19 │ 391.64ms │ 368.53ms │ +1.06x faster│ 350.72ms │ +1.12x faster│ │ QQuery 20 │ 648.22ms │ 691.41ms │ 1.07x slower│ 634.93ms │ +1.02x faster│ │ QQuery 21 │ 866.25ms │ 1020.37ms │ 1.18x slower│ 1032.58ms │ 1.19x slower│ │ QQuery 22 │ 115.94ms │ 117.27ms │ no change│ 114.35ms │ +1.01x faster│ └──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴───────────────┘
No response
Is your feature request related to a problem or challenge?
Now benchmark scripts only supports comparing two benchmark run results, see https://github.com/apache/datafusion/tree/main/benchmarks
It would be useful to display the difference among multiple benchmark runs
Describe the solution you'd like
In this case the first run can be the baseline, and all the following runs should be compared against the baseline. The result will look like
Describe alternatives you've considered
No response
Additional context
No response