ecraven / r7rs-benchmarks

Benchmarks for various Scheme implementations. Taken with kind permission from the Larceny project, based on the Gabriel and Gambit benchmarks.
271 stars 32 forks source link

Is total-accumulated-runtime unintentionally misleading? #10

Closed michaellenaghan closed 8 years ago

michaellenaghan commented 8 years ago

total-accumulated-runtime shows Chez with a small lead over Gambit, and Gambit with a small lead over Larceny. But tests-finished shows that Chez is only running 45 benchmarks, while Gambit is running 51, and Larceny is running 54. Are Gambit and Larceny taking a hit in that first graph just because they're doing more work?

PS Really, really love this!

ecraven commented 8 years ago

total-accumulated-runtime shows Chez with a small lead over Gambit, and Gambit with a small lead over Larceny. But tests-finished shows that Chez is only running 45 benchmarks, while Gambit is running 51, and Larceny is running 54. Are Gambit and Larceny taking a hit in that first graph just because they're doing more work?

If you read the description, you'll see that total-accumulated-runtime only accumulates the runtime of all the tests that every scheme managed to finish, so the comparisons are valid. Nonetheless, I've removed the entire diagram a few weeks ago, because it is so easy to misinterpret :)

PS Really, really love this!

Thanks, I haven't had much time the last few weeks, but plan to do more work on this.

michaellenaghan commented 8 years ago

Understood, thanks. (I was looking at https://www.nexoid.at/tmp/scheme-benchmark-r7rs.html which I thought were the official published results, but I realize now that nothing actually says they are.)