feat(leap): add performance article

vaeng commented 9 months ago

~~To make the links work, this should be merged after #759~~ :heavy_check_mark:
~~To make the image work, https://github.com/exercism/images/pull/24 should be merged beforehand~~. :heavy_check_mark:

siebenschlaefer commented 9 months ago

Hej!

When I ran the tests on my computer I got significantly different results:

I compiled with GCC-13.2.0 and -O3
I ran cpupower frequency-set --governor performance to avoid any interference from the processor
I ran benchmark_leap --benchmark_repetitions=100

On my computer (with a AMD Ryzen 1920X) I got these results (slightly shortened):

Benchmark                                     Time             CPU   Iterations
-------------------------------------------------------------------------------
BM_Leap_boolean_chain_mean                 1925 ns         1924 ns          100
BM_Leap_boolean_chain_median               1952 ns         1952 ns          100
BM_Leap_boolean_chain_stddev               37.9 ns         37.9 ns          100
BM_Leap_boolean_chain_cv                   1.97 %          1.97 %           100
BM_Leap_boolean_chain_inverse_mean         1213 ns         1213 ns          100
BM_Leap_boolean_chain_inverse_median       1210 ns         1210 ns          100
BM_Leap_boolean_chain_inverse_stddev       20.2 ns         20.2 ns          100
BM_Leap_boolean_chain_inverse_cv           1.67 %          1.67 %           100
BM_Leap_ternary_mean                        976 ns          976 ns          100
BM_Leap_ternary_median                      968 ns          968 ns          100
BM_Leap_ternary_stddev                     19.6 ns         19.6 ns          100
BM_Leap_ternary_cv                         2.01 %          2.01 %           100
BM_Leap_chrono_mean                         949 ns          949 ns          100
BM_Leap_chrono_median                       947 ns          947 ns          100
BM_Leap_chrono_stddev                      7.14 ns         7.11 ns          100
BM_Leap_chrono_cv                          0.75 %          0.75 %           100
BM_Leap_boost_mean                         1685 ns         1685 ns          100
BM_Leap_boost_median                       1683 ns         1683 ns          100
BM_Leap_boost_stddev                       9.90 ns         9.93 ns          100
BM_Leap_boost_cv                           0.59 %          0.59 %           100
BM_empty_read_mean                        0.000 ns        0.000 ns          100
BM_empty_read_median                      0.000 ns        0.000 ns          100
BM_empty_read_stddev                      0.000 ns        0.000 ns          100
BM_empty_read_cv                           1.78 %          1.20 %           100

It looks like "boolean_chain" is among the slowest approaches, "boolean_chain_inverse" is significantly faster but "ternary" beats both. That's surprising. Or am I misinterpreting the numbers?

vaeng commented 9 months ago

BM_Leap_boolean_chain_median 1952 ns BM_Leap_boolean_chain_inverse_mean 1213 ns

That is strange. I used g++ leap_benchmark.cpp -std=c++20 -isystem benchmark/include -Lbenchmark/build/src -lbenchmark -lpthread -o leap_benchmark, compiled with g++ version 14.10.0. So the article's values are without -O3.

I can't change my CPU to performance, as the VM does not support it, but that should not change things so drastically. I reran the tests many times and always got those results. Even with -O3.

Another go:


❯ gcc leap_benchmark.cpp -std=gnu++2a -O3 -isystem benchmark/include  -Lbenchmark/build/src -lbenchmark -lpthread -o leap_benchmark -lstdc++ -lm
❯ ./leap_benchmark --benchmark_report_aggregates_only=true --benchmark_repetitions=10
BM_Leap_boolean_chain_mean                  937 ns          937 ns           10
BM_Leap_boolean_chain_median                937 ns          937 ns           10
BM_Leap_boolean_chain_stddev               4.00 ns         4.01 ns           10
BM_Leap_boolean_chain_cv                   0.43 %          0.43 %            10
BM_Leap_boolean_chain_inverse_mean         1334 ns         1334 ns           10
BM_Leap_boolean_chain_inverse_median       1333 ns         1333 ns           10
BM_Leap_boolean_chain_inverse_stddev       3.36 ns         3.32 ns           10
BM_Leap_boolean_chain_inverse_cv           0.25 %          0.25 %            10
BM_Leap_ternary_mean                       1060 ns         1060 ns           10
BM_Leap_ternary_median                     1062 ns         1061 ns           10
BM_Leap_ternary_stddev                     2.95 ns         2.90 ns           10
BM_Leap_ternary_cv                         0.28 %          0.27 %            10
BM_Leap_chrono_mean                        1144 ns         1144 ns           10
BM_Leap_chrono_median                      1143 ns         1143 ns           10
BM_Leap_chrono_stddev                      4.75 ns         4.75 ns           10
BM_Leap_chrono_cv                          0.42 %          0.42 %            10
BM_Leap_boost_mean                         1018 ns         1018 ns           10
BM_Leap_boost_median                       1018 ns         1018 ns           10
BM_Leap_boost_stddev                       2.51 ns         2.45 ns           10
BM_Leap_boost_cv                           0.25 %          0.24 %            10
BM_empty_read_mean                        0.614 ns        0.614 ns           10
BM_empty_read_median                      0.614 ns        0.614 ns           10
BM_empty_read_stddev                      0.001 ns        0.001 ns           10
BM_empty_read_cv                           0.13 %          0.12 %            10
```text

clechasseur commented 9 months ago

Finally managed to run benchmarks on my old, slow computer 😉

$ ./leap_benchmark --benchmark_report_aggregates_only=true --benchmark_repetitions=10
2024-01-11T07:30:31-05:00
Running ./leap_benchmark
Run on (4 X 2712 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x2)
  L1 Instruction 32 KiB (x2)
  L2 Unified 256 KiB (x2)
  L3 Unified 3072 KiB (x1)
Load Average: 0.85, 0.63, 0.62
-------------------------------------------------------------------------------
Benchmark                                     Time             CPU   Iterations
-------------------------------------------------------------------------------
BM_Leap_boolean_chain_mean                17235 ns        17235 ns           10
BM_Leap_boolean_chain_median              17157 ns        17157 ns           10
BM_Leap_boolean_chain_stddev                193 ns          193 ns           10
BM_Leap_boolean_chain_cv                   1.12 %          1.12 %            10
BM_Leap_boolean_chain_inverse_mean        35652 ns        35652 ns           10
BM_Leap_boolean_chain_inverse_median      34359 ns        34359 ns           10
BM_Leap_boolean_chain_inverse_stddev       2747 ns         2747 ns           10
BM_Leap_boolean_chain_inverse_cv           7.70 %          7.70 %            10
BM_Leap_ternary_mean                      21083 ns        21084 ns           10
BM_Leap_ternary_median                    20425 ns        20425 ns           10
BM_Leap_ternary_stddev                     1350 ns         1350 ns           10
BM_Leap_ternary_cv                         6.40 %          6.40 %            10
BM_Leap_chrono_mean                       20505 ns        20506 ns           10
BM_Leap_chrono_median                     19862 ns        19863 ns           10
BM_Leap_chrono_stddev                      1577 ns         1577 ns           10
BM_Leap_chrono_cv                          7.69 %          7.69 %            10
BM_Leap_boost_mean                        36426 ns        36427 ns           10
BM_Leap_boost_median                      35927 ns        35927 ns           10
BM_Leap_boost_stddev                        824 ns          824 ns           10
BM_Leap_boost_cv                           2.26 %          2.26 %            10
BM_empty_read_mean                         1834 ns         1834 ns           10
BM_empty_read_median                       1833 ns         1833 ns           10
BM_empty_read_stddev                       5.73 ns         5.71 ns           10
BM_empty_read_cv                           0.31 %          0.31 %            10

Looks like on my computer, boost is much slower than chrono, which differs from @vaeng's results. Aside from that it seems to be similar (slower but proportional).

vaeng commented 9 months ago

@clechasseur at least chain and inverse chain are in a "sensible" order.

What do you and @siebenschlaefer think? Should I add or edit some of the article, or let it stand as it is for my machine?

vaeng commented 8 months ago

@ErikSchierboom have you had diverging benchmarks for an approach article before?

clechasseur commented 8 months ago

When I read the different approach documents for the Leap exercise, I wondered whether results could vary depending on how the CPU handles pipelining. The "inverse chain" is an example; on paper, it looks worse because the edge case is checked fist. But what if instructions are reordered in some way? I am wondering whether @siebenschlaefer's results can be explained by the fact that they're using an AMD processor (I am on Intel).

I will also admit to not being a fan of such very low-level optimizations in general. Optimizing a method to determine if a year is a leap year seems a bit extreme: is that really the performance bottleneck of your program? Let's say that it's true that the ternary approach is 27% slower; it's 27% of such an infinitesimal amount of time, does it really matter? But I will leave it to you to determine whether the article should be posted: in itself, it seems fine.

vaeng commented 8 months ago

AMD processor (I am on Intel)

My results are AMD as well.

Let's say that it's true that the ternary approach is 27% slower; it's 27% of such an infinitesimal amount of time, does it really matter?

I should include that into the article.

ErikSchierboom commented 8 months ago

@ErikSchierboom have you had diverging benchmarks for an approach article before?

Nope, but I've been the only one doing them on my articles :D

vaeng commented 8 months ago

I think the wording in the article is a good summary of our discussion here @siebenschlaefer @ErikSchierboom. I'd like to merge the article to see the new invert mechanism :)

iHiD commented 8 months ago

Merged. Everyone feel free to do follow ups if appropriate but it's good for us to test this -invertible functionality before Tues :)

exercism / cpp

feat(leap): add performance article #760