Open ALTree opened 7 years ago
Thanks for raising this as a separate issue. I'd like to point out that it's not just that Go is slower, but the way the time increases with the size of the big.Int
. In my testing:
gmp:
500,000 - 15 ms
5,000,000 - 215 ms (14.3x increase)
50,000,000 - 3990 ms (18.6x increase)
500,000,000 - 57617 ms (14.4x increase)
Go big.Int:
50,000 - 3 ms
500,000 - 282 ms (94x increase, 19x slower than gmp)
5,000,000 - 37236 ms (132x increase, 173x slower than gmp)
50,000,000 - 3,587,249 ms (96x increase, 899x slower than gmp)
The increase in time with the increase in number of digits is not quite linear even with gmp, but it's close (1.5x to 2x linear). The increase with Go seems to be quadratic (number of digits increases 10x, time increases 100x).
This doesn't happen when converting to a base that's a power of 2, which uses a different code path. Base 8 takes only 109 ms for 10e50,000,000 for example.
pprof -top output:
3583.92s of 3587.94s total (99.89%)
Dropped 19 nodes (cum <= 17.94s)
flat flat% sum% cum cum%
1613.02s 44.96% 44.96% 1613.02s 44.96% runtime.lostProfileData
995.63s 27.75% 72.71% 995.63s 27.75% math/big.subVV
946.63s 26.38% 99.09% 946.63s 26.38% math/big.mulAddVWW
24.58s 0.69% 99.77% 24.58s 0.69% math/big.addMulVVW
3.10s 0.086% 99.86% 27.95s 0.78% math/big.basicMul
0.57s 0.016% 99.88% 33.86s 0.94% math/big.karatsuba
0.33s 0.0092% 99.89% 1940.49s 54.08% math/big.nat.divLarge
0.06s 0.0017% 99.89% 1940.55s 54.09% math/big.nat.convertWords
0 0% 99.89% 1974.47s 55.03% main.bigIntToString
0 0% 99.89% 1974.47s 55.03% main.main
0 0% 99.89% 1974.47s 55.03% math/big.(*Int).String
0 0% 99.89% 1974.47s 55.03% math/big.(*Int).Text
0 0% 99.89% 33.88s 0.94% math/big.divisors
0 0% 99.89% 1940.49s 54.08% math/big.nat.div
0 0% 99.89% 1974.44s 55.03% math/big.nat.itoa
0 0% 99.89% 33.86s 0.94% math/big.nat.mul
0 0% 99.89% 1974.47s 55.03% runtime.goexit
0 0% 99.89% 1974.47s 55.03% runtime.main
@realityexists Thanks for the measurements, that's useful. The reason why it's faster (and why there's a different implementation) for a conversion base that's a power of 2 is of course that the conversion is trivial in that case: The result merely requires reading out the existing bits.
Any base that's not a power of 2 requires real work. The basic conversion algorithm is quadratic so I'm not too surprised. But there's better approaches.
@griesemer More than half the time is consumed in the divLarge calls from convertWords, which uses a divide and conquer algorithm to split the input. math/big doesn't currently have an implementation of Burnikel and Ziegler's Fast Recursive Division which would reduce those big divisions' asymptotic complexity to the multiplication time, ~n^1.6 for karatsuba. Does it make sense to open a separate performance issue to implement Fast Recursive Division that would speed up not only string output but any large divisions?
@bmkessler Sure, that sounds good. Faster division would be beneficial in general.
A relevant paper that avoids most of the divisions is:
Cyril Bouvier, Paul Zimmermann. Division-Free Binary-to-Decimal Conversion. IEEE Trans- actions on Computers, Institute of Electrical and Electronics Engineers, 2014, 63 (8), pp.1895- 1901 https://hal.inria.fr/hal-00864293v2/document
It contains both quadratic basecase and sub-quadratic algorithms. Each use one division to compute a floating point approximation of a/(base^k) and then only use multiplications in the splitting and digit generating loop, i.e. computing most significant digits first. They achieve a speedup of about 55% and 19% faster than GMP in the basecase and subquadratic cases respectively.
Also, in the intro they mention that there were not able to find much relevant other literature on efficient multiprecision radix conversion besides the standard references:
D. E. Knuth, The Art of Computer Programming vol. 2: Seminumerical Algorithms, http://www-cs-staff.stanford.edu/∼knuth/taocp.html
R. P. Brent and P. Zimmermann, Modern Computer Arithmetic http://www.loria.fr/∼zimmerma/mca/pub226.html
Was toying around with Go, implementing a factorial calculation with math/big
. Turns out computing 1,000,000! takes less time than actually stringifying the result. :smiley:
Il giorno 6 gennaio 2018, alle ore 22:25, Alex Ewerlöf notifications@github.com ha scritto:
Was toying around with Go, implementing a factorial calculation with math/big. Turns out computing 1,000,000! takes less time than actually stringifying the result.
This is normal, expected, and common to every reasonable implementation of bignums.
Well, I haven't tried them all so I don't know. But hopefully the Go implementation will be faster in the future.
Just thought I would update that the recursive division change https://golang.org/cl/172018 improves string formatting considerably as expected. For the example program above about a 4x improvement.
bmkessler$ gotip build main.go; time ./main > /dev/null
real 0m9.350s
user 0m10.226s
sys 0m0.208s
bmkessler$ ~/dev/go/bin/go build main.go; time ./main > /dev/null
real 0m1.905s
user 0m1.906s
sys 0m0.004s
(Forked out from #11068 which is about slow
big.Float
printing but will likely be fixed by not converting in full precision before printing).The following program:
which computes
2**5000000
and prints it, takes about 7 seconds on my machine to print the 1505151 characters-long answer.Almost all the time is spent generating the
String
representation ofn
. For comparison, the following equivalentgmp
program:which prints the same string, is ~20x faster.