Open sbird opened 6 years ago
All of these are good points. These benchmarks were not chosen with any particular care—we just needed some basic benchmarks to compare our then-nascent Julia implementation with other languages and we wanted to exercise a good selection of basic programming language features: iteration, recursion, swapping around array elements, parsing, overhead for calling C libraries with small matrices, etc. It's fairly tricky to keep up with preventing compilers from optimizing in ways that defeat the purpose of simple benchmarks like this. It may be time for a significant overhaul of the Julia microbenchmarks suite, but then rewriting them in a dozen or so languages is kind of a big job.
Julia versions of all the shootout benchmarks are here:
https://github.com/JuliaCI/BaseBenchmarks.jl/tree/master/src/shootout
These and all the other benchmarks in BaseBenchmarks are run regularly (on request) on PRs to Julia. Comparing the performance of these across time is useful but these aren't particularly good benchmarks to compare across languages because of the culture around them of doing anything and everything possible to optimize them. That's not what we really want to measure: we're looking to compare reasonable implementations of the same algorithm across different languages. So if one language uses double recursion for Fibonacci, then they all should.
Ah! That link was exactly what I wanted, thank you!
I understand that from the point of view of language implementors these kinds of benchmarks may be less useful than the existing micro suite. But, from my perspective as a language user - an example of the most efficient way to implement a given algorithm in a given language is much more useful, because it shows me how to write efficient code in that language!
Clang 6 is able to optimise the fibonacci test to something like: int fibs[20] = {1, 2, 3, 5, 8, ...} printf("%d\n", fibs[i]); and thus completes in zero time. (gcc and gfortran 8 also do the fibonacci test remarkably fast which suggests to me that they use a smaller memoization table).
The pi_sum test does the same thing 500 times, but gfortran 8 completes extremely fast, which suggests that it is able to recognise that the loop body is executing the same code with no side-effects and optimize the loop away entirely.
The quicksort test counts the time to generate 5000 random numbers in fortran and julia but not in C or python.
The tests that call cblas dominate runtime and are a little strange: really they are testing whether the language has cblas bindings, which is a feature test, not a benchmark.
A more general point is that benchmarks like this that consist of running extremely simple algorithms multiple times are not really an informative way to measure performance of a language. The point of this repo, as I understand it, is to tell me, the potential julia user, whether and by how much julia is likely to be faster than, eg, python, for my particular scientific problem, but it doesn't do this. Since a large part of the point of julia is to be faster, it would be useful to have some meaningful way of showing that it is, indeed, faster.
For example, have you considered instead writing julia examples for the algorithms here: https://benchmarksgame-team.pages.debian.net/benchmarksgame/ ? This would get you comparisons with other languages for free (and you could have some CI that ran them on every commit to find regressions).