QBayLogic / benchmark-compilation

7 stars 2 forks source link

Zen 2 CPUs made the benchmark results severely out of date #1

Open deepfire opened 4 years ago

deepfire commented 4 years ago

Zen2 CPUs, such as Ryzen 3900x and 3950x (as well as the newer Intel offerings, such as i9-10900K) made the published comparison severely out of date.

It would have been rather quite nice to have a run of the same benchmarks against then newer options!

There is also quite some intrigue, since:

It's also worth noting, that GHC builds scale quite poorly to higher core counts.

christiaanb commented 4 years ago

We'll update the blog post noting that the conclusions are out of date.

deepfire commented 4 years ago

Part of the problem is that the benchmark doesn't seem to pin the dependencies, so reproducibility could be an issue..

Maybe if we reimplement this using Nix.. it'd make the benchmark deployment a breese, as well.

What do you think, @christiaanb ?

christiaanb commented 4 years ago

Yes, that's definitely an issue. Nix would definitely help there, and perhaps will also enable us to include Haskell compile as a benchmark in https://openbenchmarking.org/ so that sites like Phoronix can run Haskell compile bench whenever a new processor is released.

deepfire commented 4 years ago

@christiaanb, so I've made some progress -- you can take a look at https://github.com/phoronix-test-suite/test-profiles/commit/cadb82b48f6835678f80d9dd3d91ce83ba8a9bb3

Michael included it despite my PR being marked RFC -- which I've intended to run through you first (but not before I sorted out the minor details..).

Currently, the benchmark consists of just compiling clash-prelude, clash-lib and clash-ghc with GHC-8.10.1.

..and you can already see the summary of preliminary results from the initial, slightly buggy version of the benchmark (https://github.com/phoronix-test-suite/test-profiles/commit/cadb82b48f6835678f80d9dd3d91ce83ba8a9bb3#commitcomment-39685158), which ran three iterations, instead of one, leading to 30 minute+ run times.

I've fixed it since, and Michael already included the new version, so you can see lower numbers already coming up in https://openbenchmarking.org/test/pts/build-clash-1.0.0.

Note, that it doesn't include:

Last, but not least -- my changes to compilation-benchmark are in https://github.com/deepfire/benchmark-compilation, which I can submit as a PR, if you are interested.

deepfire commented 4 years ago

Also, i7-8550U getting ahead of i7-9750H is a god damned puzzler for me..

Maybe cooling was an issue, as is often with laptops..

And yes, 3950X only winning over the same i7-8550U laptop CPU by a very slight margin -- is also an eye opener -- the 1.5x memory latency that Zen2 has over intel is definitely an issue..

deepfire commented 4 years ago

New tally, for --iterations 1 runs that are added on openbenchmarking (https://openbenchmarking.org/test/pts/build-clash-1.0.0) -- with the old --iterations 3 results rescaled (and marked with strike-through) for comparability with the new timings -- sorted by clash timing, where available, otherwise by gradle:

CPU ghc -j phys cores base, GHz max, GHz L3, MB clash Java Gradle
10900K 20 10 3.7 5.3 20 294 188
9900KS 16 8 4.0 5.0 16 193
9900K 16 8 3.6 5.0 16 321
3300X 8 4 3.8 4.3 16 354
10980XE 36 18 3.0 4.6 24.75 247
3950X 32 15 3.5 4.7 64 ~363~ 251
3900XT 24 12 3.8 4.7 64 364 251
3700X 16 8 3.6 4.4 32 367
3900X 24 12 3.8 4.6 64 252
8550U 8 4 1.8 4.0 8 ~369~
2500K 4 4 3.3 3.7 6 375
8265U 8 4 1.6 3.9 6 ~375~ 309
3960X 48 24 3.8 4.5 128 382
3200U 4 2 2.6 3.5 4 327
3990X 128 64 2.9 4.3 256 413
1065G7 8 4 1.3 3.9 8 ~429~ 362
9750H 12 6 2.6 4.5 12 429 220
5600U 4 2 2.6 3.2 4 451 367
4500U 6 6 2.3 4.0 8 475 217
4700U 8 8 2.0 4.1 8 224
3770 8 4 3.4 3.9 8 534
2700K 8 4 3.5 3.9 8 297

Also, included Java Gradle build timings from openbenchmarking, as it's also compilation by a highly-optimised GC-based compiler.

UPDATE: added quite a bunch of new CPU results from openbenchmarking.

christiaanb commented 4 years ago

Thanks for putting in all this effort in getting the benchmark into openbenchmarking! Would really welcome the PR. Also gonna run this new script on our machine.

christiaanb commented 4 years ago

I do wonder if we should see whether we can use those "optimised" RTS settings qn8 -A32M, since the default setting really penalizes the high-thread/core count CPUs, while it doesn't negatively affects the low-thread/core count CPUs.

deepfire commented 4 years ago

https://github.com/QBayLogic/benchmark-compilation/pull/2 is up!

deepfire commented 4 years ago

I'll add a flag to use the optimised (as well as custom) RTS opts..