Closed vaeng closed 8 months ago
Hej!
When I ran the tests on my computer I got significantly different results:
-O3
cpupower frequency-set --governor performance
to avoid any interference from the processorbenchmark_leap --benchmark_repetitions=100
On my computer (with a AMD Ryzen 1920X) I got these results (slightly shortened):
Benchmark Time CPU Iterations
-------------------------------------------------------------------------------
BM_Leap_boolean_chain_mean 1925 ns 1924 ns 100
BM_Leap_boolean_chain_median 1952 ns 1952 ns 100
BM_Leap_boolean_chain_stddev 37.9 ns 37.9 ns 100
BM_Leap_boolean_chain_cv 1.97 % 1.97 % 100
BM_Leap_boolean_chain_inverse_mean 1213 ns 1213 ns 100
BM_Leap_boolean_chain_inverse_median 1210 ns 1210 ns 100
BM_Leap_boolean_chain_inverse_stddev 20.2 ns 20.2 ns 100
BM_Leap_boolean_chain_inverse_cv 1.67 % 1.67 % 100
BM_Leap_ternary_mean 976 ns 976 ns 100
BM_Leap_ternary_median 968 ns 968 ns 100
BM_Leap_ternary_stddev 19.6 ns 19.6 ns 100
BM_Leap_ternary_cv 2.01 % 2.01 % 100
BM_Leap_chrono_mean 949 ns 949 ns 100
BM_Leap_chrono_median 947 ns 947 ns 100
BM_Leap_chrono_stddev 7.14 ns 7.11 ns 100
BM_Leap_chrono_cv 0.75 % 0.75 % 100
BM_Leap_boost_mean 1685 ns 1685 ns 100
BM_Leap_boost_median 1683 ns 1683 ns 100
BM_Leap_boost_stddev 9.90 ns 9.93 ns 100
BM_Leap_boost_cv 0.59 % 0.59 % 100
BM_empty_read_mean 0.000 ns 0.000 ns 100
BM_empty_read_median 0.000 ns 0.000 ns 100
BM_empty_read_stddev 0.000 ns 0.000 ns 100
BM_empty_read_cv 1.78 % 1.20 % 100
It looks like "boolean_chain" is among the slowest approaches, "boolean_chain_inverse" is significantly faster but "ternary" beats both. That's surprising. Or am I misinterpreting the numbers?
BM_Leap_boolean_chain_median 1952 ns BM_Leap_boolean_chain_inverse_mean 1213 ns
That is strange. I used g++ leap_benchmark.cpp -std=c++20 -isystem benchmark/include -Lbenchmark/build/src -lbenchmark -lpthread -o leap_benchmark
, compiled with g++ version 14.10.0. So the article's values are without -O3
.
I can't change my CPU to performance, as the VM does not support it, but that should not change things so drastically. I reran the tests many times and always got those results. Even with -O3
.
Another go:
❯ gcc leap_benchmark.cpp -std=gnu++2a -O3 -isystem benchmark/include -Lbenchmark/build/src -lbenchmark -lpthread -o leap_benchmark -lstdc++ -lm
❯ ./leap_benchmark --benchmark_report_aggregates_only=true --benchmark_repetitions=10
BM_Leap_boolean_chain_mean 937 ns 937 ns 10
BM_Leap_boolean_chain_median 937 ns 937 ns 10
BM_Leap_boolean_chain_stddev 4.00 ns 4.01 ns 10
BM_Leap_boolean_chain_cv 0.43 % 0.43 % 10
BM_Leap_boolean_chain_inverse_mean 1334 ns 1334 ns 10
BM_Leap_boolean_chain_inverse_median 1333 ns 1333 ns 10
BM_Leap_boolean_chain_inverse_stddev 3.36 ns 3.32 ns 10
BM_Leap_boolean_chain_inverse_cv 0.25 % 0.25 % 10
BM_Leap_ternary_mean 1060 ns 1060 ns 10
BM_Leap_ternary_median 1062 ns 1061 ns 10
BM_Leap_ternary_stddev 2.95 ns 2.90 ns 10
BM_Leap_ternary_cv 0.28 % 0.27 % 10
BM_Leap_chrono_mean 1144 ns 1144 ns 10
BM_Leap_chrono_median 1143 ns 1143 ns 10
BM_Leap_chrono_stddev 4.75 ns 4.75 ns 10
BM_Leap_chrono_cv 0.42 % 0.42 % 10
BM_Leap_boost_mean 1018 ns 1018 ns 10
BM_Leap_boost_median 1018 ns 1018 ns 10
BM_Leap_boost_stddev 2.51 ns 2.45 ns 10
BM_Leap_boost_cv 0.25 % 0.24 % 10
BM_empty_read_mean 0.614 ns 0.614 ns 10
BM_empty_read_median 0.614 ns 0.614 ns 10
BM_empty_read_stddev 0.001 ns 0.001 ns 10
BM_empty_read_cv 0.13 % 0.12 % 10
```text
Finally managed to run benchmarks on my old, slow computer 😉
$ ./leap_benchmark --benchmark_report_aggregates_only=true --benchmark_repetitions=10
2024-01-11T07:30:31-05:00
Running ./leap_benchmark
Run on (4 X 2712 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x2)
L1 Instruction 32 KiB (x2)
L2 Unified 256 KiB (x2)
L3 Unified 3072 KiB (x1)
Load Average: 0.85, 0.63, 0.62
-------------------------------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------------------------------
BM_Leap_boolean_chain_mean 17235 ns 17235 ns 10
BM_Leap_boolean_chain_median 17157 ns 17157 ns 10
BM_Leap_boolean_chain_stddev 193 ns 193 ns 10
BM_Leap_boolean_chain_cv 1.12 % 1.12 % 10
BM_Leap_boolean_chain_inverse_mean 35652 ns 35652 ns 10
BM_Leap_boolean_chain_inverse_median 34359 ns 34359 ns 10
BM_Leap_boolean_chain_inverse_stddev 2747 ns 2747 ns 10
BM_Leap_boolean_chain_inverse_cv 7.70 % 7.70 % 10
BM_Leap_ternary_mean 21083 ns 21084 ns 10
BM_Leap_ternary_median 20425 ns 20425 ns 10
BM_Leap_ternary_stddev 1350 ns 1350 ns 10
BM_Leap_ternary_cv 6.40 % 6.40 % 10
BM_Leap_chrono_mean 20505 ns 20506 ns 10
BM_Leap_chrono_median 19862 ns 19863 ns 10
BM_Leap_chrono_stddev 1577 ns 1577 ns 10
BM_Leap_chrono_cv 7.69 % 7.69 % 10
BM_Leap_boost_mean 36426 ns 36427 ns 10
BM_Leap_boost_median 35927 ns 35927 ns 10
BM_Leap_boost_stddev 824 ns 824 ns 10
BM_Leap_boost_cv 2.26 % 2.26 % 10
BM_empty_read_mean 1834 ns 1834 ns 10
BM_empty_read_median 1833 ns 1833 ns 10
BM_empty_read_stddev 5.73 ns 5.71 ns 10
BM_empty_read_cv 0.31 % 0.31 % 10
Looks like on my computer, boost
is much slower than chrono
, which differs from @vaeng's results. Aside from that it seems to be similar (slower but proportional).
@clechasseur at least chain and inverse chain are in a "sensible" order.
What do you and @siebenschlaefer think? Should I add or edit some of the article, or let it stand as it is for my machine?
@ErikSchierboom have you had diverging benchmarks for an approach article before?
When I read the different approach documents for the Leap exercise, I wondered whether results could vary depending on how the CPU handles pipelining. The "inverse chain" is an example; on paper, it looks worse because the edge case is checked fist. But what if instructions are reordered in some way? I am wondering whether @siebenschlaefer's results can be explained by the fact that they're using an AMD processor (I am on Intel).
I will also admit to not being a fan of such very low-level optimizations in general. Optimizing a method to determine if a year is a leap year seems a bit extreme: is that really the performance bottleneck of your program? Let's say that it's true that the ternary approach is 27% slower; it's 27% of such an infinitesimal amount of time, does it really matter? But I will leave it to you to determine whether the article should be posted: in itself, it seems fine.
AMD processor (I am on Intel)
My results are AMD as well.
Let's say that it's true that the ternary approach is 27% slower; it's 27% of such an infinitesimal amount of time, does it really matter?
I should include that into the article.
@ErikSchierboom have you had diverging benchmarks for an approach article before?
Nope, but I've been the only one doing them on my articles :D
I think the wording in the article is a good summary of our discussion here @siebenschlaefer @ErikSchierboom. I'd like to merge the article to see the new invert mechanism :)
Merged. Everyone feel free to do follow ups if appropriate but it's good for us to test this -invertible
functionality before Tues :)
To make the links work, this should be merged after #759:heavy_check_mark:To make the image work, https://github.com/exercism/images/pull/24 should be merged beforehand. :heavy_check_mark: