flatsurf / e-antic

Embedded algebraic number fields
https://flatsurf.github.io/e-antic/libeantic/
GNU Lesser General Public License v3.0
12 stars 11 forks source link

speed up arithmetic #180

Closed saraedum closed 3 years ago

saraedum commented 3 years ago

I tried to improve the performance of some typical C++ operations.

Below are the timings. Unfortunately, there is a lot of noise in these measurements.

--------------------------------------------------------------------------------------------------
Benchmark                                                       Before        After      Quotient
--------------------------------------------------------------------------------------------------
ConstructGenerator/1                                           93.0 ns      2.42 ns         38.42
ConstructGenerator/2                                            281 ns      2.42 ns        116.11
ConstructGenerator/4                                            324 ns      2.42 ns        133.88
ConstructTrivialElement                                        31.4 ns      33.5 ns          0.93
ConstructTrivialElement<int>                                   34.5 ns      36.6 ns          0.94
ConstructTrivialElement<long long>                             44.5 ns      45.2 ns          0.98
ConstructTrivialElement<long>                                  34.6 ns      36.6 ns          0.94
ConstructTrivialElement<mpq_class>                             75.2 ns      76.1 ns          0.98
ConstructTrivialElement<mpz_class>                             39.5 ns      42.5 ns          0.92
ConstructTrivialElement<renf_elem_class>                       43.9 ns      41.5 ns          1.05
ConstructTrivialElementInField<int>/1                          34.2 ns      35.2 ns          0.97
ConstructTrivialElementInField<int>/2                          34.2 ns      35.2 ns          0.97
ConstructTrivialElementInField<int>/4                          34.3 ns      35.2 ns          0.97
ConstructTrivialElementInField<long long>/1                    44.3 ns      43.9 ns          1.00
ConstructTrivialElementInField<long long>/2                    44.2 ns      43.9 ns          1.00
ConstructTrivialElementInField<long long>/4                    44.2 ns      43.9 ns          1.00
ConstructTrivialElementInField<long>/1                         34.2 ns      35.2 ns          0.97
ConstructTrivialElementInField<long>/2                         34.2 ns      35.2 ns          0.97
ConstructTrivialElementInField<long>/4                         34.2 ns      35.2 ns          0.97
ConstructTrivialElementInField<mpq_class>/1                    57.0 ns      60.3 ns          0.94
ConstructTrivialElementInField<mpq_class>/2                    57.0 ns      60.2 ns          0.94
ConstructTrivialElementInField<mpq_class>/4                    57.0 ns      60.0 ns          0.95
ConstructTrivialElementInField<mpz_class>/1                    39.0 ns      42.1 ns          0.92
ConstructTrivialElementInField<mpz_class>/2                    39.1 ns      42.2 ns          0.92
ConstructTrivialElementInField<mpz_class>/4                    39.0 ns      42.1 ns          0.92
ConstructTrivialElementInField<renf_elem_class>/1              43.9 ns      41.1 ns          1.06
ConstructTrivialElementInField<renf_elem_class>/2              43.9 ns      41.1 ns          1.06
ConstructTrivialElementInField<renf_elem_class>/4              43.9 ns      41.1 ns          1.06
ConstructTrivialField                                          2.07 ns      2.44 ns          0.84
Equality<int>/1                                                7.98 ns      7.25 ns          1.10
Equality<int>/2                                                5.32 ns      5.53 ns          0.96
Equality<int>/4                                                5.25 ns      5.53 ns          0.94
Equality<long long>/1                                          38.6 ns      37.6 ns          1.02
Equality<long long>/2                                          8.29 ns      7.60 ns          1.09
Equality<long long>/4                                          8.30 ns      7.61 ns          1.09
Equality<long>/1                                               7.99 ns      7.71 ns          1.03
Equality<long>/2                                               5.52 ns      6.01 ns          0.91
Equality<long>/4                                               5.20 ns      5.53 ns          0.94
Equality<mpq_class>/1                                          20.2 ns      20.9 ns          0.96
Equality<mpq_class>/2                                          4.85 ns      5.87 ns          0.82
Equality<mpq_class>/4                                          5.18 ns      6.22 ns          0.83
Equality<mpz_class>/1                                          13.1 ns      14.5 ns          0.90
Equality<mpz_class>/2                                          5.79 ns      5.53 ns          1.04
Equality<mpz_class>/4                                          5.82 ns      5.53 ns          1.05
Equality<renf_elem_class>/1                                    10.0 ns      12.1 ns          0.82
Equality<renf_elem_class>/2                                    7.95 ns      10.6 ns          0.75
Equality<renf_elem_class>/4                                    9.82 ns      10.9 ns          0.90
Inequality<int>/1                                              6.91 ns      7.60 ns          0.90
Inequality<int>/2                                              4.49 ns      5.87 ns          0.76
Inequality<int>/4                                              4.49 ns      5.88 ns          0.76
Inequality<long long>/1                                        38.8 ns      38.0 ns          1.02
Inequality<long long>/2                                        7.94 ns      7.60 ns          1.04
Inequality<long long>/4                                        7.95 ns      7.60 ns          1.04
Inequality<long>/1                                             7.26 ns      7.95 ns          0.91
Inequality<long>/2                                             4.84 ns      6.21 ns          0.77
Inequality<long>/4                                             4.85 ns      6.21 ns          0.78
Inequality<mpq_class>/1                                        19.9 ns      21.0 ns          0.94
Inequality<mpq_class>/2                                        4.84 ns      5.87 ns          0.82
Inequality<mpq_class>/4                                        4.84 ns      6.22 ns          0.77
Inequality<mpz_class>/1                                        13.5 ns      14.9 ns          0.90
Inequality<mpz_class>/2                                        6.13 ns      5.87 ns          1.04
Inequality<mpz_class>/4                                        6.19 ns      5.87 ns          1.05
Inequality<renf_elem_class>/1                                  10.0 ns      11.7 ns          0.85
Inequality<renf_elem_class>/2                                  7.95 ns      10.3 ns          0.77
Inequality<renf_elem_class>/4                                  9.92 ns      10.6 ns          0.93
Nonzero/1                                                      6.91 ns      4.14 ns          1.66
Nonzero/2                                                      4.49 ns      3.80 ns          1.18
Nonzero/4                                                      4.16 ns      4.14 ns          1.00
TrivialAddition<int>/1                                         46.0 ns      46.1 ns          0.99
TrivialAddition<int>/2                                         44.6 ns      46.0 ns          0.96
TrivialAddition<int>/4                                          124 ns       124 ns          1.00
TrivialAddition<long long>/1                                   60.9 ns      64.0 ns          0.95
TrivialAddition<long long>/2                                   62.3 ns      64.3 ns          0.96
TrivialAddition<long long>/4                                    145 ns       146 ns          0.99
TrivialAddition<long>/1                                        49.0 ns      47.7 ns          1.02
TrivialAddition<long>/2                                        47.7 ns      46.9 ns          1.01
TrivialAddition<long>/4                                         123 ns       124 ns          0.99
TrivialAddition<mpq_class>/1                                   95.3 ns      95.1 ns          1.00
TrivialAddition<mpq_class>/2                                    104 ns       110 ns          0.94
TrivialAddition<mpq_class>/4                                   95.6 ns      99.2 ns          0.96
TrivialAddition<mpz_class>/1                                   43.7 ns      44.6 ns          0.97
TrivialAddition<mpz_class>/2                                   48.7 ns      51.6 ns          0.94
TrivialAddition<mpz_class>/4                                    129 ns       127 ns          1.01
TrivialAddition<renf_elem_class>/1                             46.3 ns      45.8 ns          1.01
TrivialAddition<renf_elem_class>/2                              280 ns      48.7 ns          5.74
TrivialAddition<renf_elem_class>/4                              333 ns       130 ns          2.56
TrivialAssignment<int>/1                                       41.3 ns      13.8 ns          2.99
TrivialAssignment<int>/2                                       41.5 ns      39.1 ns          1.06
TrivialAssignment<int>/4                                       55.6 ns      52.3 ns          1.06
TrivialAssignment<long long>/1                                 49.0 ns      24.3 ns          2.01
TrivialAssignment<long long>/2                                 51.4 ns      48.7 ns          1.05
TrivialAssignment<long long>/4                                 72.0 ns      61.5 ns          1.17
TrivialAssignment<long>/1                                      41.3 ns      14.1 ns          2.92
TrivialAssignment<long>/2                                      41.6 ns      39.1 ns          1.06
TrivialAssignment<long>/4                                      56.4 ns      52.3 ns          1.07
TrivialAssignment<mpq_class>/1                                 82.6 ns      58.5 ns          1.41
TrivialAssignment<mpq_class>/2                                 84.3 ns      81.8 ns          1.03
TrivialAssignment<mpq_class>/4                                  106 ns       103 ns          1.02
TrivialAssignment<mpz_class>/1                                 49.6 ns      21.7 ns          2.28
TrivialAssignment<mpz_class>/2                                 51.4 ns      45.1 ns          1.13
TrivialAssignment<mpz_class>/4                                 60.5 ns      58.7 ns          1.03
TrivialAssignment<renf_elem_class>/1                           17.2 ns      16.5 ns          1.04
TrivialAssignment<renf_elem_class>/2                           38.2 ns      40.6 ns          0.94
TrivialAssignment<renf_elem_class>/4                           52.4 ns      54.3 ns          0.96
TrivialDivision<int>/1                                         54.7 ns      54.8 ns          0.99
TrivialDivision<int>/2                                         69.8 ns      70.5 ns          0.99
TrivialDivision<int>/4                                         66.7 ns      68.9 ns          0.96
TrivialDivision<long long>/1                                   70.7 ns      72.7 ns          0.97
TrivialDivision<long long>/2                                   86.6 ns      88.1 ns          0.98
TrivialDivision<long long>/4                                   84.5 ns      86.7 ns          0.97
TrivialDivision<long>/1                                        53.3 ns      55.0 ns          0.96
TrivialDivision<long>/2                                        68.9 ns      70.4 ns          0.97
TrivialDivision<long>/4                                        66.7 ns      68.0 ns          0.98
TrivialDivision<mpq_class>/1                                    105 ns       108 ns          0.97
TrivialDivision<mpq_class>/2                                    161 ns       161 ns          1.00
TrivialDivision<mpq_class>/4                                    167 ns       167 ns          1.00
TrivialDivision<mpz_class>/1                                   62.8 ns      62.2 ns          1.00
TrivialDivision<mpz_class>/2                                   79.2 ns      79.3 ns          0.99
TrivialDivision<mpz_class>/4                                   75.3 ns      75.9 ns          0.99
TrivialDivision<renf_elem_class>/1                             95.7 ns      94.8 ns          1.00
TrivialDivision<renf_elem_class>/2                              984 ns      78.8 ns         12.48
TrivialDivision<renf_elem_class>/4                              575 ns      75.6 ns          7.60
TrivialMultiplication<int>/1                                   31.0 ns      30.1 ns          1.02
TrivialMultiplication<int>/2                                   46.6 ns      46.8 ns          0.99
TrivialMultiplication<int>/4                                   62.0 ns      62.7 ns          0.98
TrivialMultiplication<long long>/1                             44.4 ns      45.3 ns          0.98
TrivialMultiplication<long long>/2                             64.3 ns      63.6 ns          1.01
TrivialMultiplication<long long>/4                             80.1 ns      81.6 ns          0.98
TrivialMultiplication<long>/1                                  30.4 ns      30.0 ns          1.01
TrivialMultiplication<long>/2                                  46.5 ns      47.2 ns          0.98
TrivialMultiplication<long>/4                                  62.0 ns      63.5 ns          0.97
TrivialMultiplication<mpq_class>/1                             90.8 ns      89.0 ns          1.02
TrivialMultiplication<mpq_class>/2                              107 ns       111 ns          0.96
TrivialMultiplication<mpq_class>/4                              116 ns       120 ns          0.96
TrivialMultiplication<mpz_class>/1                             36.6 ns      36.8 ns          0.99
TrivialMultiplication<mpz_class>/2                             45.0 ns      45.6 ns          0.98
TrivialMultiplication<mpz_class>/4                             62.1 ns      58.9 ns          1.05
TrivialMultiplication<renf_elem_class>/1                       41.7 ns      39.8 ns          1.04
TrivialMultiplication<renf_elem_class>/2                        323 ns      47.1 ns          6.85
TrivialMultiplication<renf_elem_class>/4                        391 ns      64.7 ns          6.04
TrivialReverseAddition/1                                       49.2 ns      47.9 ns          1.02
TrivialReverseAddition/2                                        287 ns       138 ns          2.07
TrivialReverseAddition/4                                        352 ns       199 ns          1.76
TrivialReverseDivision/1                                       80.7 ns      77.6 ns          1.03
TrivialReverseDivision/2                                        313 ns      48.2 ns          6.49
TrivialReverseDivision/4                                        311 ns       130 ns          2.39
TrivialReverseMultiplication/1                                 44.3 ns      42.0 ns          1.05
TrivialReverseMultiplication/2                                  314 ns       167 ns          1.88
TrivialReverseMultiplication/4                                  390 ns       221 ns          1.76
TrivialReverseSubtraction/1                                    45.8 ns      46.5 ns          0.98
TrivialReverseSubtraction/2                                     288 ns       142 ns          2.02
TrivialReverseSubtraction/4                                     345 ns       195 ns          1.76
TrivialSubtraction<int>/1                                      43.5 ns      43.9 ns          0.99
TrivialSubtraction<int>/2                                      45.6 ns      46.0 ns          0.99
TrivialSubtraction<int>/4                                       128 ns       125 ns          1.02
TrivialSubtraction<long long>/1                                60.9 ns      62.0 ns          0.98
TrivialSubtraction<long long>/2                                64.8 ns      65.7 ns          0.98
TrivialSubtraction<long long>/4                                 162 ns       143 ns          1.13
TrivialSubtraction<long>/1                                     43.5 ns      43.9 ns          0.99
TrivialSubtraction<long>/2                                     45.6 ns      45.9 ns          0.99
TrivialSubtraction<long>/4                                      126 ns       124 ns          1.01
TrivialSubtraction<mpq_class>/1                                 119 ns      95.3 ns          1.24
TrivialSubtraction<mpq_class>/2                                 128 ns       104 ns          1.23
TrivialSubtraction<mpq_class>/4                                 230 ns       190 ns          1.21
TrivialSubtraction<mpz_class>/1                                47.8 ns      42.8 ns          1.11
TrivialSubtraction<mpz_class>/2                                60.9 ns      54.2 ns          1.12
TrivialSubtraction<mpz_class>/4                                 152 ns       130 ns          1.16
TrivialSubtraction<renf_elem_class>/1                          46.6 ns      45.9 ns          1.01
TrivialSubtraction<renf_elem_class>/2                           281 ns      55.0 ns          5.10
TrivialSubtraction<renf_elem_class>/4                           335 ns       130 ns          2.57
ZeroEquality/1                                                 6.92 ns      5.37 ns          1.28
ZeroEquality/2                                                 4.49 ns      4.50 ns          0.99
ZeroEquality/4                                                 4.15 ns      5.41 ns          0.76
ZeroInequality/1                                               6.93 ns      5.32 ns          1.30
ZeroInequality/2                                               4.49 ns      4.49 ns          1.00
ZeroInequality/4                                               4.15 ns      5.52 ns          0.75
saraedum commented 3 years ago

@videlec, let's discuss this later today.

The only interface change is a renf_elem_class::reset(renf_class) method that sets the number field of an element. It's a somewhat weird operation that zeros the element but changes the field the element it is contained in.

codecov[bot] commented 3 years ago

Codecov Report

Merging #180 (ac19a86) into master (462336f) will increase coverage by 2.13%. The diff coverage is 98.66%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #180      +/-   ##
==========================================
+ Coverage   90.21%   92.34%   +2.13%     
==========================================
  Files         105      105              
  Lines        1951     2050      +99     
==========================================
+ Hits         1760     1893     +133     
+ Misses        191      157      -34     
Impacted Files Coverage Δ
libeantic/e-antic/renf_class.hpp 100.00% <ø> (ø)
libeantic/e-antic/renf_elem_class.hpp 100.00% <ø> (ø)
libeantic/src/renf_elem/add_fmpq.c 100.00% <ø> (ø)
libeantic/src/renf_elem/clear.c 100.00% <ø> (ø)
libeantic/src/renf_elem/init.c 100.00% <ø> (ø)
libeantic/src/renf_elem/sub_fmpq.c 100.00% <ø> (ø)
libeantic/srcxx/renf_elem_class.cpp 97.69% <98.21%> (+7.19%) :arrow_up:
libeantic/e-antic/cereal.hpp 100.00% <100.00%> (ø)
libeantic/srcxx/renf_class.cpp 91.17% <100.00%> (+0.17%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 462336f...ac19a86. Read the comment docs.

videlec commented 3 years ago

Sorted by ratio we have the very bads

TrivialDivision<int>/4                                         66.7 ns        120 ns          0.55
TrivialDivision<long>/4                                        66.7 ns        121 ns          0.55
TrivialDivision<long long>/4                                   84.5 ns        136 ns          0.62

and the very goods

TrivialReverseAddition/4                                        352 ns        172 ns          2.04
TrivialReverseSubtraction/4                                     345 ns        168 ns          2.05
TrivialAssignment<long long>/1                                 49.0 ns       23.4 ns          2.09
TrivialAssignment<mpz_class>/1                                 49.6 ns       21.4 ns          2.31
TrivialReverseMultiplication/2                                  314 ns        135 ns          2.32
TrivialReverseDivision/4                                        311 ns        130 ns          2.39
TrivialSubtraction<renf_elem_class>/4                           335 ns        133 ns          2.51
TrivialAddition<renf_elem_class>/4                              333 ns        128 ns          2.60
TrivialReverseSubtraction/2                                     288 ns        109 ns          2.64
TrivialAssignment<int>/1                                       41.3 ns       15.4 ns          2.68
TrivialReverseAddition/2                                        287 ns        104 ns          2.75
TrivialAssignment<long>/1                                      41.3 ns       14.4 ns          2.86
TrivialSubtraction<renf_elem_class>/2                           281 ns       50.7 ns          5.54
TrivialAddition<renf_elem_class>/2                              280 ns       49.6 ns          5.64
TrivialReverseDivision/2                                        313 ns       50.3 ns          6.22
TrivialMultiplication<renf_elem_class>/4                        391 ns       60.4 ns          6.47
TrivialMultiplication<renf_elem_class>/2                        323 ns       45.2 ns          7.14
TrivialDivision<renf_elem_class>/4                              575 ns       73.4 ns          7.83
TrivialDivision<renf_elem_class>/2                              984 ns       76.1 ns         12.93
videlec commented 3 years ago

As discussed on zoom it would be nice to open an issue for the suspicious ones such as ConstructGenerator you mentioned.