Zen 2 CPUs made the benchmark results severely out of date

deepfire commented 4 years ago

Zen2 CPUs, such as Ryzen 3900x and 3950x (as well as the newer Intel offerings, such as i9-10900K) made the published comparison severely out of date.

It would have been rather quite nice to have a run of the same benchmarks against then newer options!

There is also quite some intrigue, since:

Intel still has a slight edge on the single-thread performance (mostly due to clocks), and also currently has palpably lower memory access latency,
AMD has far larger caches, which can be a huge factor for a GC-based language.

It's also worth noting, that GHC builds scale quite poorly to higher core counts.

christiaanb commented 4 years ago

We'll update the blog post noting that the conclusions are out of date.

deepfire commented 4 years ago

Part of the problem is that the benchmark doesn't seem to pin the dependencies, so reproducibility could be an issue..

Maybe if we reimplement this using Nix.. it'd make the benchmark deployment a breese, as well.

What do you think, @christiaanb ?

christiaanb commented 4 years ago

Yes, that's definitely an issue. Nix would definitely help there, and perhaps will also enable us to include Haskell compile as a benchmark in https://openbenchmarking.org/ so that sites like Phoronix can run Haskell compile bench whenever a new processor is released.

deepfire commented 4 years ago

@christiaanb, so I've made some progress -- you can take a look at https://github.com/phoronix-test-suite/test-profiles/commit/cadb82b48f6835678f80d9dd3d91ce83ba8a9bb3

Michael included it despite my PR being marked RFC -- which I've intended to run through you first (but not before I sorted out the minor details..).

Currently, the benchmark consists of just compiling clash-prelude, clash-lib and clash-ghc with GHC-8.10.1.

..and you can already see the summary of preliminary results from the initial, slightly buggy version of the benchmark (https://github.com/phoronix-test-suite/test-profiles/commit/cadb82b48f6835678f80d9dd3d91ce83ba8a9bb3#commitcomment-39685158), which ran three iterations, instead of one, leading to 30 minute+ run times.

I've fixed it since, and Michael already included the new version, so you can see lower numbers already coming up in https://openbenchmarking.org/test/pts/build-clash-1.0.0.

Note, that it doesn't include:

GHC compilation (which would be interesting to add)
using the optimised RTS parameters -- since I assume that defaults are the most interesting, and Phoronix can't publish everything..

Last, but not least -- my changes to compilation-benchmark are in https://github.com/deepfire/benchmark-compilation, which I can submit as a PR, if you are interested.

deepfire commented 4 years ago

Also, i7-8550U getting ahead of i7-9750H is a god damned puzzler for me..

Maybe cooling was an issue, as is often with laptops..

And yes, 3950X only winning over the same i7-8550U laptop CPU by a very slight margin -- is also an eye opener -- the 1.5x memory latency that Zen2 has over intel is definitely an issue..

deepfire commented 4 years ago

New tally, for --iterations 1 runs that are added on openbenchmarking (https://openbenchmarking.org/test/pts/build-clash-1.0.0) -- with the old --iterations 3 results rescaled (and marked with strike-through) for comparability with the new timings -- sorted by clash timing, where available, otherwise by gradle:

CPU	ghc -j	phys cores	base, GHz	max, GHz	L3, MB	clash	Java Gradle
10900K	20	10	3.7	5.3	20	294	188
9900KS	16	8	4.0	5.0	16		193
9900K	16	8	3.6	5.0	16	321
3300X	8	4	3.8	4.3	16	354
10980XE	36	18	3.0	4.6	24.75		247
3950X	32	15	3.5	4.7	64	~363~	251
3900XT	24	12	3.8	4.7	64	364	251
3700X	16	8	3.6	4.4	32	367
3900X	24	12	3.8	4.6	64		252
8550U	8	4	1.8	4.0	8	~369~
2500K	4	4	3.3	3.7	6	375
8265U	8	4	1.6	3.9	6	~375~	309
3960X	48	24	3.8	4.5	128	382
3200U	4	2	2.6	3.5	4		327
3990X	128	64	2.9	4.3	256	413
1065G7	8	4	1.3	3.9	8	~429~	362
9750H	12	6	2.6	4.5	12	429	220
5600U	4	2	2.6	3.2	4	451	367
4500U	6	6	2.3	4.0	8	475	217
4700U	8	8	2.0	4.1	8		224
3770	8	4	3.4	3.9	8	534
2700K	8	4	3.5	3.9	8		297

Also, included Java Gradle build timings from openbenchmarking, as it's also compilation by a highly-optimised GC-based compiler.

UPDATE: added quite a bunch of new CPU results from openbenchmarking.

christiaanb commented 4 years ago

Thanks for putting in all this effort in getting the benchmark into openbenchmarking! Would really welcome the PR. Also gonna run this new script on our machine.

christiaanb commented 4 years ago

I do wonder if we should see whether we can use those "optimised" RTS settings qn8 -A32M, since the default setting really penalizes the high-thread/core count CPUs, while it doesn't negatively affects the low-thread/core count CPUs.

deepfire commented 4 years ago

https://github.com/QBayLogic/benchmark-compilation/pull/2 is up!

deepfire commented 4 years ago

I'll add a flag to use the optimised (as well as custom) RTS opts..

QBayLogic / benchmark-compilation

Zen 2 CPUs made the benchmark results severely out of date #1