math/big: big.Int String conversion is slow

ALTree commented 7 years ago

(Forked out from #11068 which is about slow big.Float printing but will likely be fixed by not converting in full precision before printing).

The following program:

package main

import (
    "fmt"
    "math/big"
)

func main() {
    n := new(big.Int)
    n.Exp(big.NewInt(2), big.NewInt(5000000), nil)
    fmt.Println(n)
}

which computes 2**5000000 and prints it, takes about 7 seconds on my machine to print the 1505151 characters-long answer.

 $ time ./test > /dev/null 

real    0m6.847s
user    0m6.848s
sys 0m0.004s

Almost all the time is spent generating the String representation of n. For comparison, the following equivalent gmp program:

#include <gmp.h>

int main(void)
{
    mpz_t n, b;
    mpz_inits(n, b);
    mpz_set_ui(b, 2);
    mpz_pow_ui(n, b, 5000000);
    gmp_printf("%Zd\n", n);
}

which prints the same string, is ~20x faster.

$ time ./a.out > /dev/null 

real    0m0.332s
user    0m0.328s
sys 0m0.000s

realityexists commented 7 years ago

Thanks for raising this as a separate issue. I'd like to point out that it's not just that Go is slower, but the way the time increases with the size of the big.Int. In my testing:

gmp:
500,000     - 15 ms
5,000,000   - 215 ms (14.3x increase)
50,000,000  - 3990 ms (18.6x increase)
500,000,000 - 57617 ms (14.4x increase)

Go big.Int:
50,000     - 3 ms
500,000    - 282 ms (94x increase, 19x slower than gmp)
5,000,000  - 37236 ms (132x increase, 173x slower than gmp)
50,000,000  - 3,587,249 ms (96x increase, 899x slower than gmp)

The increase in time with the increase in number of digits is not quite linear even with gmp, but it's close (1.5x to 2x linear). The increase with Go seems to be quadratic (number of digits increases 10x, time increases 100x).

realityexists commented 7 years ago

This doesn't happen when converting to a base that's a power of 2, which uses a different code path. Base 8 takes only 109 ms for 10e50,000,000 for example.

pprof -top output:

3583.92s of 3587.94s total (99.89%)
Dropped 19 nodes (cum <= 17.94s)
      flat  flat%   sum%        cum   cum%
  1613.02s 44.96% 44.96%   1613.02s 44.96%  runtime.lostProfileData
   995.63s 27.75% 72.71%    995.63s 27.75%  math/big.subVV
   946.63s 26.38% 99.09%    946.63s 26.38%  math/big.mulAddVWW
    24.58s  0.69% 99.77%     24.58s  0.69%  math/big.addMulVVW
     3.10s 0.086% 99.86%     27.95s  0.78%  math/big.basicMul
     0.57s 0.016% 99.88%     33.86s  0.94%  math/big.karatsuba
     0.33s 0.0092% 99.89%   1940.49s 54.08%  math/big.nat.divLarge
     0.06s 0.0017% 99.89%   1940.55s 54.09%  math/big.nat.convertWords
         0     0% 99.89%   1974.47s 55.03%  main.bigIntToString
         0     0% 99.89%   1974.47s 55.03%  main.main
         0     0% 99.89%   1974.47s 55.03%  math/big.(*Int).String
         0     0% 99.89%   1974.47s 55.03%  math/big.(*Int).Text
         0     0% 99.89%     33.88s  0.94%  math/big.divisors
         0     0% 99.89%   1940.49s 54.08%  math/big.nat.div
         0     0% 99.89%   1974.44s 55.03%  math/big.nat.itoa
         0     0% 99.89%     33.86s  0.94%  math/big.nat.mul
         0     0% 99.89%   1974.47s 55.03%  runtime.goexit
         0     0% 99.89%   1974.47s 55.03%  runtime.main

griesemer commented 7 years ago

@realityexists Thanks for the measurements, that's useful. The reason why it's faster (and why there's a different implementation) for a conversion base that's a power of 2 is of course that the conversion is trivial in that case: The result merely requires reading out the existing bits.

Any base that's not a power of 2 requires real work. The basic conversion algorithm is quadratic so I'm not too surprised. But there's better approaches.

bmkessler commented 7 years ago

@griesemer More than half the time is consumed in the divLarge calls from convertWords, which uses a divide and conquer algorithm to split the input. math/big doesn't currently have an implementation of Burnikel and Ziegler's Fast Recursive Division which would reduce those big divisions' asymptotic complexity to the multiplication time, ~n^1.6 for karatsuba. Does it make sense to open a separate performance issue to implement Fast Recursive Division that would speed up not only string output but any large divisions?

griesemer commented 7 years ago

@bmkessler Sure, that sounds good. Faster division would be beneficial in general.

bmkessler commented 7 years ago

A relevant paper that avoids most of the divisions is:

Cyril Bouvier, Paul Zimmermann. Division-Free Binary-to-Decimal Conversion. IEEE Trans- actions on Computers, Institute of Electrical and Electronics Engineers, 2014, 63 (8), pp.1895- 1901 https://hal.inria.fr/hal-00864293v2/document

It contains both quadratic basecase and sub-quadratic algorithms. Each use one division to compute a floating point approximation of a/(base^k) and then only use multiplications in the splitting and digit generating loop, i.e. computing most significant digits first. They achieve a speedup of about 55% and 19% faster than GMP in the basecase and subquadratic cases respectively.

Also, in the intro they mention that there were not able to find much relevant other literature on efficient multiprecision radix conversion besides the standard references:

D. E. Knuth, The Art of Computer Programming vol. 2: Seminumerical Algorithms, http://www-cs-staff.stanford.edu/∼knuth/taocp.html

R. P. Brent and P. Zimmermann, Modern Computer Arithmetic http://www.loria.fr/∼zimmerma/mca/pub226.html

alexewerlof commented 6 years ago

Was toying around with Go, implementing a factorial calculation with math/big. Turns out computing 1,000,000! takes less time than actually stringifying the result. :smiley:

olopierpa commented 6 years ago

Il giorno 6 gennaio 2018, alle ore 22:25, Alex Ewerlöf notifications@github.com ha scritto:

Was toying around with Go, implementing a factorial calculation with math/big. Turns out computing 1,000,000! takes less time than actually stringifying the result.

This is normal, expected, and common to every reasonable implementation of bignums.

alexewerlof commented 6 years ago

Well, I haven't tried them all so I don't know. But hopefully the Go implementation will be faster in the future.

bmkessler commented 5 years ago

Just thought I would update that the recursive division change https://golang.org/cl/172018 improves string formatting considerably as expected. For the example program above about a 4x improvement.

bmkessler$ gotip build main.go; time ./main > /dev/null

real    0m9.350s
user    0m10.226s
sys 0m0.208s
bmkessler$ ~/dev/go/bin/go build main.go; time ./main > /dev/null

real    0m1.905s
user    0m1.906s
sys 0m0.004s

golang / go

math/big: big.Int String conversion is slow #20906