chfast / intx

Extended precision integer C++ library
Apache License 2.0
129 stars 29 forks source link

Optimize less-than #264

Closed chfast closed 2 years ago

chfast commented 2 years ago

The new implementation is similar to what was there: we select which half to compare and do 128-bit less-than. For small values this is equivalent to do subtraction + borrow check, for big values though we get 25% boost.

chfast commented 2 years ago

Clang

Comparing o/compare to o/compare-lt
Benchmark                                 Time             CPU      Time Old      Time New       CPU Old       CPU New
----------------------------------------------------------------------------------------------------------------------
compare<lt_public>/64                  -0.1246         -0.1246             2             2             2             2
compare<lt_public>/128                 -0.1231         -0.1231             2             2             2             2
compare<lt_public>/192                 -0.3112         -0.3112             2             2             2             2
compare<lt_public>/256                 -0.3114         -0.3114             2             2             2             2
Benchmark                        Time             CPU   Iterations
------------------------------------------------------------------
compare<lt_public>/64         2.37 ns         2.37 ns    591687936
compare<lt_public>/128        2.36 ns         2.36 ns    592440064
compare<lt_public>/192        2.36 ns         2.36 ns    591543552
compare<lt_public>/256        2.37 ns         2.37 ns    592008704
compare<lt_wordcmp>/64        2.10 ns         2.10 ns    669979392
compare<lt_wordcmp>/128       1.87 ns         1.87 ns    736609024
compare<lt_wordcmp>/192       4.13 ns         4.13 ns    339005440
compare<lt_wordcmp>/256       3.12 ns         3.12 ns    454742528
compare<lt_halves>/64         2.36 ns         2.36 ns    592504320
compare<lt_halves>/128        2.37 ns         2.37 ns    592627968
compare<lt_halves>/192        2.36 ns         2.36 ns    592624896
compare<lt_halves>/256        2.36 ns         2.36 ns    593662208
compare<lt_llvm>/64           2.03 ns         2.03 ns    688150016
compare<lt_llvm>/128          2.04 ns         2.04 ns    688088576
compare<lt_llvm>/192          2.05 ns         2.05 ns    688033792
compare<lt_llvm>/256          2.04 ns         2.04 ns    688006912

GCC

Comparing o/compare to o/compare-lt
Benchmark                                 Time             CPU      Time Old      Time New       CPU Old       CPU New
----------------------------------------------------------------------------------------------------------------------
compare<lt_public>/64                  -0.1671         -0.1671             3             2             3             2
compare<lt_public>/128                 -0.1650         -0.1650             3             2             3             2
compare<lt_public>/192                 -0.2578         -0.2578             3             2             3             2
compare<lt_public>/256                 -0.2577         -0.2577             3             2             3             2
Benchmark                        Time             CPU   Iterations
------------------------------------------------------------------
compare<lt_public>/64         2.09 ns         2.09 ns    668606208
compare<lt_public>/128        2.10 ns         2.10 ns    668350720
compare<lt_public>/192        1.87 ns         1.87 ns    750063616
compare<lt_public>/256        1.87 ns         1.87 ns    750004224
compare<lt_sub>/64            3.51 ns         3.51 ns    398873856
compare<lt_sub>/128           3.51 ns         3.51 ns    398906880
compare<lt_sub>/192           3.52 ns         3.52 ns    398810624
compare<lt_sub>/256           3.51 ns         3.51 ns    398874880
compare<lt_wordcmp>/64        2.09 ns         2.09 ns    669849344
compare<lt_wordcmp>/128       5.00 ns         5.00 ns    278123008
compare<lt_wordcmp>/192       5.28 ns         5.28 ns    264916224
compare<lt_wordcmp>/256       5.24 ns         5.24 ns    264640768
compare<lt_halves>/64         2.55 ns         2.55 ns    552285696
compare<lt_halves>/128        2.53 ns         2.53 ns    552007936
compare<lt_halves>/192        2.54 ns         2.54 ns    552038656
compare<lt_halves>/256        2.54 ns         2.54 ns    552642560
codecov-commenter commented 2 years ago

Codecov Report

Merging #264 (f4cff75) into master (ce3f7a1) will not change coverage. The diff coverage is 100.00%.

@@            Coverage Diff            @@
##            master      #264   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           10        10           
  Lines         1911      1912    +1     
=========================================
+ Hits          1911      1912    +1     
Flag Coverage Δ
32bit 100.00% <100.00%> (ø)
gcc 100.00% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
include/intx/intx.hpp 100.00% <100.00%> (ø)