Closed jgertm closed 6 years ago
Awesome!
I suspected that our not-invented-here binaryGCD
is not that good, but your results show that it is tremendously worse.
@jgertm which version of GHC are you using?
I run your benchmark on my box with GHC 8.4.1 x64:
GCD/large coprimes/Prelude.gcd mean 67.30 ns ( +- 1.517 ns )
GCD/large coprimes/binaryGCD mean 120.1 ns ( +- 2.085 ns )
GCD/large coprimes/Prelude.coprime mean 68.65 ns ( +- 4.798 ns )
GCD/large coprimes/coprime mean 123.7 ns ( +- 7.513 ns )
GCD/powers of 2/Prelude.gcd mean 48.41 ns ( +- 1.175 ns )
GCD/powers of 2/binaryGCD mean 30.72 ns ( +- 781.1 ps )
GCD/powers of 2/Prelude.coprime mean 47.16 ns ( +- 1.209 ns )
GCD/powers of 2/coprime mean 23.58 ns ( +- 437.1 ps )
GCD/power of 23/Prelude.gcd mean 39.66 ns ( +- 4.347 ns )
GCD/power of 23/binaryGCD mean 65.26 ns ( +- 10.23 ns )
GCD/power of 23/Prelude.coprime mean 37.00 ns ( +- 2.705 ns )
GCD/power of 23/coprime mean 66.35 ns ( +- 13.97 ns )
It is interesting that on my box binaryGCD
is 30% faster on powers of two, but on your box it is actually slower.
binaryGCD
heavily relies on Math.NumberTheory.Utils.trailZeros#
. Newer GHCs (>= 7.10) has a built-in ctz#
function, which can be used instead. Putting
trailZeros# :: Word# -> Int#
trailZeros# w = word2Int# (ctz# w)
into Math.NumberTheory.Utils
results in following benchmarks:
GCD/large coprimes/Prelude.gcd mean 67.86 ns ( +- 964.1 ps )
GCD/large coprimes/binaryGCD mean 57.80 ns ( +- 2.449 ns )
GCD/large coprimes/Prelude.coprime mean 67.86 ns ( +- 2.095 ns )
GCD/large coprimes/coprime mean 58.43 ns ( +- 10.66 ns )
GCD/powers of 2/Prelude.gcd mean 49.17 ns ( +- 4.313 ns )
GCD/powers of 2/binaryGCD mean 26.42 ns ( +- 2.844 ns )
GCD/powers of 2/Prelude.coprime mean 47.54 ns ( +- 1.062 ns )
GCD/powers of 2/coprime mean 23.61 ns ( +- 260.5 ps )
GCD/power of 23/Prelude.gcd mean 38.32 ns ( +- 1.663 ns )
GCD/power of 23/binaryGCD mean 40.04 ns ( +- 1.119 ns )
GCD/power of 23/Prelude.coprime mean 36.91 ns ( +- 1.700 ns )
GCD/power of 23/coprime mean 37.02 ns ( +- 816.6 ps )
Here binaryGCD
is either as fast as Prelude.gcd
, or up to 2x faster.
@jgertm could you please try to reproduce this on your box?
I'll merge it for now due to upcoming migration from criterion
to gauge
, thanks for contribution!
I took a stab at #80.
So far, I have only benchmarked cases 1 and 2. These are the preliminary results on my box:
I'd appreciate any feedback, aspecially regarding the benchmark cases.