Open 983 opened 4 years ago
@983 amazing investigative work on this. It's interesting to see how the compiler is optimizing the recursive calls. I am open to suggestions on how to improve the fairness of the benchmark.
I can see the following options:
Natively compiled, statically typed
to the Optimized
table.-fno-inline-small-functions
flag when compiling with gcc.__attribute__((noinline))
.The two last options do not completely fix the problem, because gcc will still inline the last recursive call. To get rid of that, also specify -fno-optimize-sibling-calls
.
The Nim, fortran and cython benchmark can be fixed the same way:
nim cpp -d:release --passC:-fno-inline-small-functions --passC:-fno-optimize-sibling-calls fib.nim
gfortran -fno-inline-small-functions -fno-optimize-sibling-calls -O3 -o fib fib.f03
cython --embed -o fib.pyx.c fib.pyx && gcc -fno-inline-functions -fno-optimize-sibling-calls -O3 -o fib fib.pyx.c
In case that this behavior should change in the future, here is my process to determine which flags were responsible:
-f<flag>
by trial and error until gcc optimizes too much. See https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html for the list of compiler flags per optimization level.-O3 -fno-<flag>
.Nevertheless, I think a compiler which is very good at inlining should still be honored in some way. Maybe the running time with all optimizations enabled could be included in another column or table?
The average time 10 developers use to find the compiler flags to go from "regularly compiled code" to "optimized code" could be included in the table. This would make the results for the optimized Nim code shine.
I think the best solution is to implement #85 and change the exmamples to take a parameter . This should prevent some of the optimizations made here but still allow for languages to shine that have better inlining. For now, I have added the flag no-inline-small-functions
.
In a way I both agree and disagree with the comment: compiler optimization should not be restricted, if a language allows it, so much the better. I think however the optimized folder contains code that is "human" optimized (see my code with the lisp memoized function I opened in the issues); and that may be cheating. Maybe fibonacci func is not the best to benchmark for that specific reason?
I have moved back to allowing compiler optimizations. However, I will only allow compiler flags that are deemed safe and "release" quality i.e. -O3.
Maybe fibonacci func is not the best to benchmark for that specific reason?
Naive recursive Fibonacci, O(Fib(N)) is definitely not representative of most code. The fact that there's an O(N) algorithm (iterative x+=y; y+=x;
or x+=y; swap(x,y)
) means that there's room for sufficiently clever optimizers to find big improvements.
If you choose naive recursive Fibonacci as your test function in the first place, surely the point is to test how well compilers convert inefficient double-recursion to something like a loop containing single recursion, or even something better. And function-call overhead since a single addition is a tiny amount of work for a single function call.
Doubly-recursive functions for traversing binary trees are common in real programs, but they differ in not having redundancy / common-subexpressions with each other. So that recursion pattern is one that's important for compilers to know how to optimize some.
If you want to know how a language performs in general for a variety of computational problems, naive Fibonacci isn't a good choice, but it is an interesting and well-known problem to benchmark. (https://stackoverflow.com/questions/76611371/execution-time-of-recursive-fibonacci-function-is-slower-in-c-than-equivalent-ni is an example of someone being puzzled by Nim being faster than C on this benchmark.)
Perhaps some text somewhere pointing out that apparently trivial differences can lead to major compiler optimizations could avoid confusing people?
When compiling Nim to C++, this is the simplified result:
Nothing wrong here yet. But when compiled with
g++
, the disassembled program will look similar to this:Apparently,
fib
is only called for the parameters34, 35, 36, 37, 38, 39, 40
and then the results are added up. This is not a fair comparison, because Nim is not doing the same computations as the other languages. I believe that Nim should be moved to the sectionOptimized
. Otherwise, one would have to argue how much unrolling is ok and how much is too much, but there is no good answer for that.Likewise, C and C++ partially unroll the fib function calls, although not as far. I have not checked the Fortran and Cython binaries, but they are probably doing the same thing.
It was suggested in another issue that the parameter
n
should be configurable to prevent optimization. However, that is insufficient since it still allows to unroll thefib
function for small values ofn
. The only solution seems to be to inspect the generated assembly codeHere is a fair assembly implementation to compare against. If a program is faster, it is likely cheating in some way or another.