bddicken / languages

Compare languages
591 stars 139 forks source link

Benchmark Issues #71

Open Brandon-T opened 14 hours ago

Brandon-T commented 14 hours ago
  1. These benchmarks aren't just testing the 1 billion loops, they test how long it takes for the OS to launch the program, and how long it takes to print to the console.
  2. It also tests how long it takes the JVM to start up as well, since it does run java java.code 40 where java will have to start a JVM instance, and then run the code. Likewise for other interpreted code such as python.
  3. They are testing print statements like printf which does I/O to $CONOUT/stdout for example. Each language will print differently.
  4. rand in C will outperform Java's Integer.random by a lot due to C's rand not being thread-safe vs. Java's. rand_r would be equivalent in C, or ThreadLocalRandom.current().nextInt() in Java.
  5. The benchmarks aren't testing the fastest possible way to do things in each language. In C, you would not initialize the array to 0 and then overwrite each index. You'd skip initialization and just write to the array.

Overall, we should be testing only the 1 billion loops by setting up a timer in each language, looping, then taking the difference from start to finish, rather than launching the programs as-is.

A cold start JVM will take quite a bit of time!

C should be using -O3 -funroll-loops -march=native -ftree-vectorize -ffast-math

nervenes commented 12 hours ago

Yeah.. let's solve an already useless benchmark by making it more useless 👍

Seriously though, these types of benchmarks doesn't benchmark anything, especially when they're compiled - like the swift vs rust vs c vs c++ will just be llvm vs llvm doing no iterations as they're optimized away. and they'll always be faster than a language that'll require a runtime or can't optimize away the loops, which again - resembles barely anything meaningful in the real world.

Brandon-T commented 12 hours ago

@nervenes By your logic, we shouldn't ever benchmark anything at all. The benchmarks test compiler optimizations and instruction generation as well as code constructs -- as do all benchmarks. That's the point. C was created to be as close as possible to bare metal. They'd only be doing no iterations IF the compiler can optimize it away if it finds the variable unused, or written to but never read.

they'll always be faster than a language that'll require a runtime or can't optimize away the loops, which again

That's a problem with that language/compiler/runtime then. This is literally how we choose what languages to write games in.

The point of my post was to give languages like Java a fairer chance, by not testing the JVM launch speed, but rather the code execution itself

nervenes commented 12 hours ago

No, what you're suggesting is the opposite. To actually benchmark the runtime (i.e code execution as you're referring it to) you'd need a long running process actually computing something, of which is neither what you're suggesting nor what the current benchmarks are.

Brandon-T commented 12 hours ago

No, what you're suggesting is the opposite. To actually benchmark the runtime you'd need a long running process actually computing something, of which is neither what you're suggesting nor what the current benchmarks are.

I did suggest NOT to test launching the JVM or python interpreter as part of the benchmark, and to use a timer + the benchmark code to test the actual code execution.

nervenes commented 12 hours ago

You appear to not understand how this works, at all.

Brandon-T commented 12 hours ago

You appear not to be able to comprehend what I wrote, and is off on a tangent. I'm aware the benchmark is not sorting an array of 1b integers testing a specific algorithm like quicksort or bubble or w/e "real world" test you want it to do. So yes, it does not have a real world impact and you can't post on a blog that X language's implementation of Y is faster.

But the point of benchmarks is a: A compiler test. b: Instruction generation. c: Runtime. d: Optimizations. That applies here. The compiled code for all the languages WILL be faster than the interpreted ones. Everyone knows this.

What you're missing, is that not all compiled code is created equal. Not all optimizations are done equally. Not all language rules allow certain optimizations. Not all runtimes or interpreters are created equal. It is totally fair to test how fast python loops over arrays vs. Java for example, without doing anything meaningful like checking how fast they sort. That tests the interpreters speed at processing instructions.

So yes, while it's LLVM vs. LLVM for different languages when you get down to it, it shows the optimizations that can or can't be applied to certain languages or runtimes. We do comparisons with clang vs. gcc all the time, for meaningless code just to see what the performance is like or code generation or optimizations, etc.

You could always take it up with the author if you wanted something "meaningful". Maybe open your own issue telling them you'd like to see something meaningful in the real world instead of a benchmark that just sums integers that could be unrolled or completely optimized away.

TLDR: My point was simply to point out that Java and other runtimes aren't technically treated fairly in THIS benchmark.

nervenes commented 11 hours ago

TLDR: My point was simply to point out that Java and other runtimes aren't technically treated fairly in THIS benchmark.

I'm not disagreeing with this fact. I'm disagreeing with your solution to it.

we should be testing only the 1 billion loops by setting up a timer in each language, looping, then taking the difference from start to finish

This still doesn't solve the fact that the compiled languages will likely optimize away the entire loop while others don't, and what you fail to understand is that you're no longer comparing equal work, and you also seem to confuse that with equal codegen, which is not the point. In order to see if one language is faster than the other, you need to do similar work, during a longer period. The current benchmarks, and the one you're arguing to implement, are flawed for the intent of comparing language performance.

cyrusmsk commented 11 hours ago

Yeah.. let's solve an already useless benchmark by making it more useless 👍

Seriously though, these types of benchmarks doesn't benchmark anything, especially when they're compiled - like the swift vs rust vs c vs c++ will just be llvm vs llvm doing no iterations as they're optimized away. and they'll always be faster than a language that'll require a runtime or can't optimize away the loops, which again - resembles barely anything meaningful in the real world.

Of course they are not. They depend on many things. For example here (https://github.com/jinyus/related_post_gen) Java outperform Swift and other LLVM-based solutions. The issue of calculating time only for real work and not for the I/O, VM start and other things is relevant. And this is what was reasonably mentioned by the issue author.

For real usage scenario you probably gonna run environment only once, and then in the cycle process requests/data many times.

cyrusmsk commented 11 hours ago

No, what you're suggesting is the opposite. To actually benchmark the runtime (i.e code execution as you're referring it to) you'd need a long running process actually computing something, of which is neither what you're suggesting nor what the current benchmarks are.

But I agree that the calculation in the current form is not very "fair", because some advanced optimizations could just squash internal loop, while others can't. That's why I propose to make calculations more tricky, so LLVM can't skip it.

Brandon-T commented 11 hours ago

@nervenes Nothing I suggested makes it no longer compare equal work.

This still doesn't solve the fact that the compiled languages will likely optimize away the entire loop while others don't,

You state this is fact but it doesn't and it takes a few seconds on godbolt just to verify that with any C or C++ compiler: https://godbolt.org/z/j3GWehxed. The C code generates the following Assembly with the loops intact with. Clang and GCC generate completely different code as well.

You can even add -O3 -funroll-loops -march=native -ftree-vectorize -ffast-math and see it still never gets rid of the loops as long as the array is read from.

This is true for Swift, Rust, and Go as well. It doesn't optimize them away, though you CAN technically do that, but there's no point at all then since C's time would literally be 0. As someone above said, they suggested to make the calculations much more complex already, but that doesn't negate any of the fixes I mentioned :)

nervenes commented 11 hours ago

@nervenes Nothing I suggested makes it no longer compare equal work.

I know, i'm pointing out that the benchmarks already doesn't compare equal work.

You state this is fact but it doesn't and it takes a few seconds on godbolt just to verify that with any C or C++ compiler: https://godbolt.org/z/j3GWehxed.

I stand corrected, I shouldn't have used the word "fact".