Support multiple (>2) results comparison in benchmark scripts

Is your feature request related to a problem or challenge?

Now benchmark scripts only supports comparing two benchmark run results, see https://github.com/apache/datafusion/tree/main/benchmarks

Comparing main and mybranch
--------------------
Benchmark tpch.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃         main ┃     mybranch ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │    2520.52ms │    2795.09ms │  1.11x slower │
│ QQuery 2     │     222.37ms │     216.01ms │     no change │
│ QQuery 3     │     248.41ms │     239.07ms │     no change │
│ QQuery 4     │     144.01ms │     129.28ms │ +1.11x faster │
│ QQuery 5     │     339.54ms │     327.53ms │     no change │
│ QQuery 6     │     147.59ms │     138.73ms │ +1.06x faster │
│ QQuery 7     │     605.72ms │     631.23ms │     no change │
│ QQuery 8     │     326.35ms │     372.12ms │  1.14x slower │
│ QQuery 9     │     579.02ms │     634.73ms │  1.10x slower │
│ QQuery 10    │     403.38ms │     420.39ms │     no change │
│ QQuery 11    │     201.94ms │     212.12ms │  1.05x slower │
│ QQuery 12    │     235.94ms │     254.58ms │  1.08x slower │
│ QQuery 13    │     738.40ms │     789.67ms │  1.07x slower │
│ QQuery 14    │     198.73ms │     206.96ms │     no change │
│ QQuery 15    │     183.32ms │     179.53ms │     no change │
│ QQuery 16    │     168.57ms │     186.43ms │  1.11x slower │
│ QQuery 17    │    2032.57ms │    2108.12ms │     no change │
│ QQuery 18    │    1912.80ms │    2134.82ms │  1.12x slower │
│ QQuery 19    │     391.64ms │     368.53ms │ +1.06x faster │
│ QQuery 20    │     648.22ms │     691.41ms │  1.07x slower │
│ QQuery 21    │     866.25ms │    1020.37ms │  1.18x slower │
│ QQuery 22    │     115.94ms │     117.27ms │     no change │
└──────────────┴──────────────┴──────────────┴───────────────┘

It would be useful to display the difference among multiple benchmark runs

Describe the solution you'd like

Support display a single benchmark run's result (just for pretty printing)

Support benchmark results comparison among multiple runs, for example:

./bench.sh compare main branch1 branch2

In this case the first run can be the baseline, and all the following runs should be compared against the baseline. The result will look like

┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃         main ┃     branch1  ┃  Change 1    ┃     branch2   ┃  Change 2    ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │    2520.52ms │    2795.09ms │  1.11x slower│    2430.15ms │ +1.04x faster│
│ QQuery 2     │     222.37ms │     216.01ms │     no change│     230.75ms │  1.04x slower│
│ QQuery 3     │     248.41ms │     239.07ms │     no change│     242.33ms │     no change│
│ QQuery 4     │     144.01ms │     129.28ms │ +1.11x faster│     148.45ms │  1.03x slower│
│ QQuery 5     │     339.54ms │     327.53ms │     no change│     319.07ms │ +1.06x faster│
│ QQuery 6     │     147.59ms │     138.73ms │ +1.06x faster│     142.21ms │ +1.04x faster│
│ QQuery 7     │     605.72ms │     631.23ms │     no change│     598.91ms │ +1.01x faster│
│ QQuery 8     │     326.35ms │     372.12ms │  1.14x slower│     390.47ms │  1.20x slower│
│ QQuery 9     │     579.02ms │     634.73ms │  1.10x slower│     602.58ms │  1.04x slower│
│ QQuery 10    │     403.38ms │     420.39ms │     no change│     417.24ms │     no change│
│ QQuery 11    │     201.94ms │     212.12ms │  1.05x slower│     199.07ms │ +1.01x faster│
│ QQuery 12    │     235.94ms │     254.58ms │  1.08x slower│     248.45ms │  1.05x slower│
│ QQuery 13    │     738.40ms │     789.67ms │  1.07x slower│     765.93ms │  1.04x slower│
│ QQuery 14    │     198.73ms │     206.96ms │     no change│     195.21ms │ +1.02x faster│
│ QQuery 15    │     183.32ms │     179.53ms │     no change│     182.04ms │     no change│
│ QQuery 16    │     168.57ms │     186.43ms │  1.11x slower│     172.59ms │  1.02x slower│
│ QQuery 17    │    2032.57ms │    2108.12ms │     no change│    1982.37ms │ +1.03x faster│
│ QQuery 18    │    1912.80ms │    2134.82ms │  1.12x slower│    2057.35ms │  1.08x slower│
│ QQuery 19    │     391.64ms │     368.53ms │ +1.06x faster│     350.72ms │ +1.12x faster│
│ QQuery 20    │     648.22ms │     691.41ms │  1.07x slower│     634.93ms │ +1.02x faster│
│ QQuery 21    │     866.25ms │    1020.37ms │  1.18x slower│    1032.58ms │  1.19x slower│
│ QQuery 22    │     115.94ms │     117.27ms │     no change│     114.35ms │ +1.01x faster│
└──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴───────────────┘

Describe alternatives you've considered

No response

Additional context

No response

apache / datafusion

Support multiple (>2) results comparison in benchmark scripts #13446

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Describe alternatives you've considered

Additional context