Open ThomasArts opened 3 months ago
Thanks, I have a rough idea of what might have caused this. I'll have a deeper look at this later in the week. :-)
I can reproduce a slight slowdown with https://github.com/erlang/otp/commit/24ef4cbaeda9b9c26682cba75f2f15b0c58722aa, but I'm really puzzled to see it on https://github.com/erlang/otp/commit/8504d0e0b84f57950c94bf6244a9699102893b7d as https://github.com/erlang/otp/commit/3fcd74cdbb094de048623b0da5a3aac0e0b7335c fixes it completely for me.
What platform is this? Have you turned off all kinds of CPU frequency scaling?
I've opened #8347 that should improve things a bit, I'm seeing slightly better results locally but I'm still a bit puzzled as to why it was so much slower on your machine.
I've run Thomas's test on my machines:
Linux, i9-10900X CPU @ 3.70GHz | 26.2.3 | master | pr8347 |
---|---|---|---|
100 | 139 | 143 | 123 |
1000 | 1580 | 1615 | 1780 |
10000 | 18500 | 18160 | 17250 |
M1 Pro, macOS 14.4.1 | 26.2.2 | master | pr8347 |
---|---|---|---|
100 | 170 | 174 | 173 |
1000 | 2240 | 2330 | 2340 |
10000 | 29400 | 30500 | 30400 |
Just to be sure I did run test again on Mac M2 Pro, Ventura 13.6 with results comparable to earlier run:
M2 Pro, macOS 13.6 (32 GB) | 26.2.1 | 26.2.2 | master (8504d0e0b8) | pr8347 (436568a31bf) |
---|---|---|---|---|
100 | 80 | 81 | 95 | 95 |
1000 | 920 | 930 | 1050 | 1060 |
10000 | 13400 | 13400 | 15700 | 15200 |
I'll try to build pr8347? as well to see if I can relate to the above numbers. It might of course, be something M2 specific. I built from source.
This is tangental to the main topic here, but how come your M2 Pro is twice as fast as my M1 Pro @ThomasArts? :)
According to generic benchmarks, there should be no more than 10-20% performance difference between these chips. After I saw your results I compiled Erlang without extra microstate accounting and without DTrace support, to avoid any overhead from these features. I also made sure to test without using the shell (erlperf -pa . 'ppp:test(100).'
) but I still get ~170ms results. Do you do anything special when running your tests? What flags have you compiler Erlang with?
I don't see you on the Erlanger slack. If you don't mind joining and chatting there for a bit, I'd appreciate that. Hopefully we can all learn something about Erlang/OTP performance on Apple Silicon (or in general, based on what we find).
This is tangental to the main topic here, but how come your M2 Pro is twice as fast as my M1 Pro
Have you compiled all Erlang code you are running with the Erlang compiler in Erlang/OTP 27?
The format of the type information in BEAM files are different in OTP 26 and OTP 27. If the format of type information is not the expected one, the JIT cannot do any type-guided optimizations, and will potentially emit worse code.
This is just a wild guess.
Yes, I compile the code with the version I use at runtime. I built from master
today, compiled ppp.erl
with this version, ran it with erlperf
as above and my times are ~170ms, which is like twice what Thomas gets.
Oh, I found the culprit. When debugging a different issue recently I set the CFLAGS to -O0
. After setting back to -O3
, I get the ~87ms.
Just as an anecdote: the issue I was debugging turned out to be a problem running Docker image built for x86 on an ARM Macbook. Thanks to all the Docker/Rosetta2 magic that generally works, but recently we made some changes in RabbitMQ that triggered some bug probably (in Rosetta2 I guess? Not sure, too much low-level magic). Anyway, we get some crazy failures where the stacktraces reported don't match the source code, which is why I suspected that maybe GCC optimisations had something to do with that. The problem still exists, but since one needs to run an "incorrect" architecture of an image, I haven't reported it. Instead, we are working on having ARM builds of the Docker images in our pipelines (they've been available for the final releases for a long time, I'm just talking about our internal builds).
I'm interested in figuring out a way to tell options that were used to build ERTS. Specifically, I want erlperf
to output some sort of a warning "your ERTS is built with these flags". That'd prevent situations like one provided above.
@max-au there is erlang:system_info(compile_info)
that returns a proplist with cflags, ldflags and config_h flags. Maybe that could help?
1> proplists:get_value(cflags, erlang:system_info(compile_info)).
"-Werror=undef -Werror=implicit -Werror=return-type -fno-common -g -O2
-I...asdf_24.3.4.16/otp_src_24.3.4.16/erts/x86_64-apple-darwin22.6.0
-DHAVE_CONFIG_H
-Wall -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Wdeclaration-after-statement
-DUSE_THREADS -D_THREAD_SAFE -D_REENTRANT -DPOSIX_THREADS -DBEAMASM=1"
Describe the bug It seems that
binary_to_term
got slower.To Reproduce I used the following simple way to generate a rather large term and then N copies of that term as a binary in a list. On each of the copies we run
binary_to_term
. Expectation is that it is equally fast as for OTP-26.1.2... but it isn't.On OTP26, roughly:
But on the latest master:
source-8504d0e0b8
the 10k test gets much slower!Test program
Expected behavior Equal speeds are expected
Affected versions This seems introduced in commit: 24ef4cbaed for OTP27. The last commit with faster times is: 49024e83a2
Additional context This is observed when testing riak and may be related to issue: https://github.com/erlang/otp/issues/8229