[proposal] Subquadratic algorithm for todigits

trizen commented 5 years ago

Hi. I would like to suggest a divide-and-conquer subquadratic algorithm to be used in todigits(n,b), presented in the book Modern Computer Arithmetic, by Richard P. Brent and Paul Zimmermann.

A simple implementation in Perl, using Math::GMPz, would be:


use 5.020;
use strict;
use warnings;

use Math::GMPz;
use Math::Prime::Util::GMP qw(todigits logint vecsum factorial);

use experimental qw(signatures);

sub FastIntegerOutput ($A, $B) {    # B is an unsigned native integer

    $A = Math::GMPz->new("$A");

    # Find k such that B^(2k - 2) <= A < B^(2k)
    my $k = (logint($A, $B) >> 1) + 1;

    my $Q = Math::GMPz::Rmpz_init();
    my $R = Math::GMPz::Rmpz_init();

    sub ($A, $k) {

        if (Math::GMPz::Rmpz_cmp_ui($A, $B) < 0) {
            return Math::GMPz::Rmpz_get_ui($A);
        }

        my $t = Math::GMPz::Rmpz_init();
        Math::GMPz::Rmpz_ui_pow_ui($t, $B, 2 * ($k - 1));    # can this be optimized away?

        if (Math::GMPz::Rmpz_cmp($t, $A) > 0) {
            --$k;
        }

        Math::GMPz::Rmpz_ui_pow_ui($t, $B, $k);
        Math::GMPz::Rmpz_divmod($Q, $R, $A, $t);

        my $w = ($k + 1) >> 1;
        Math::GMPz::Rmpz_set($t, $Q);

        my @right = __SUB__->($R, $w);
        my @left  = __SUB__->($t, $w);

        (@left, (0) x ($k - scalar(@right)), @right);
    }->($A, $k);
}

foreach my $B (2 .. 100) {    # run some tests
    my $N = factorial($B);    # int(rand(~0));

    my @a = todigits($N, $B);
    my @b = FastIntegerOutput($N, $B);

    if ("@a" ne "@b") {
        die "Error for FastIntegerOutput($N, $B): (@a) != (@b)";
    }
}

use Time::HiRes qw(tv_interval gettimeofday);

my $N = factorial(1e5);
my $B = 100;

my $time = [gettimeofday];

say "Sum: ", vecsum(FastIntegerOutput($N, $B));
say "FastIntegerOutput took ", tv_interval($time), " seconds";

$time = [gettimeofday];

say "Sum: ", vecsum(todigits($N, $B));
say "MPU_GMP took ", tv_interval($time), " seconds";

Output:

Sum: 10658934
FastIntegerOutput took 0.773024 seconds
Sum: 10658934
MPU_GMP took 33.418434 seconds

Furthermore, if only the sum of the digits is needed, we can make the implementation even faster by adding the digits right away:

use 5.020;
use strict;
use warnings;

use Math::GMPz;
use ntheory qw(:all);
use experimental qw(signatures);

sub FastSumOfDigits ($A, $B) {

    $A = Math::GMPz->new("$A");

    # Find k such that B^(2k - 2) <= A < B^(2k)
    my $k = (logint($A, $B) >> 1) + 1;

    my $Q = Math::GMPz::Rmpz_init();
    my $R = Math::GMPz::Rmpz_init();

    sub ($A, $k) {

        if (Math::GMPz::Rmpz_cmp_ui($A, $B) < 0) {
            return Math::GMPz::Rmpz_get_ui($A);
        }

        my $w = ($k + 1) >> 1;
        my $t = Math::GMPz::Rmpz_init();

        Math::GMPz::Rmpz_ui_pow_ui($t, $B, $k);
        Math::GMPz::Rmpz_divmod($Q, $R, $A, $t);
        Math::GMPz::Rmpz_set($t, $Q);

        __SUB__->($R, $w) + __SUB__->($t, $w);  # may overflow when B is very large
    }->($A, $k);
}

foreach my $B (2 .. 300) {    # run some tests
    my $N = factorial($B);    # int(rand(~0));

    my $x = vecsum(todigits($N, $B));
    my $y = FastSumOfDigits($N, $B);

    if ($x != $y) {
        die "Error for FastSumOfDigits($N, $B): $x != $y";
    }
}

foreach my $n (1 .. 6) {
    say "FastSumOfDigits((10^$n)!, 100) = ", FastSumOfDigits(factorial(10**$n), 100);
}

Output:

FastSumOfDigits((10^1)!, 100) = 153
FastSumOfDigits((10^2)!, 100) = 3663
FastSumOfDigits((10^3)!, 100) = 57420
FastSumOfDigits((10^4)!, 100) = 818136
FastSumOfDigits((10^5)!, 100) = 10658934
FastSumOfDigits((10^6)!, 100) = 131474079
perl FastSumOfDigits.pl  10.52s user 0.07s system 99% cpu 10.624 total

trizen commented 4 years ago

Added in https://github.com/danaj/Math-Prime-Util-GMP/commit/31dabd92bcac1b0995648bccf5770652074609a4. Many thanks!

danaj commented 4 years ago

Excellent! I just pushed some code that does this in GMP. It could be further optimized in a few ways, notably not allocating memory for each invocation (Q should easily go in place, R is a little harder but at worst could be moved in place). Calling logint() each call is also not ideal -- your code doesn't do that. On the other hand, I end recursion at 32 bits, and logint should be reasonably fast below 768 bits.

I also think there is a huge amount of wasted time in back-and-forth string conversions. That's a bigger issue in general.

I need to get a profiler running to understand where the time is being spent. For now at least it is a lot faster than before.

Before:

Sum: 10658934
FastIntegerOutput took 1.382698 seconds
Sum: 10658934
MPU_GMP took 13.51419 seconds

After:

Sum: 10658934
FastIntegerOutput took 1.436776 seconds
Sum: 10658934
MPU_GMP took 0.254289 seconds

Before:

time perl fsd.pl
FastSumOfDigits((10^1)!, 100) = 153
FastSumOfDigits((10^2)!, 100) = 3663
FastSumOfDigits((10^3)!, 100) = 57420
FastSumOfDigits((10^4)!, 100) = 818136
FastSumOfDigits((10^5)!, 100) = 10658934
FastSumOfDigits((10^6)!, 100) = 131474079

real    0m16.529s
user    0m16.443s
sys 0m0.056s

After:

$ make >/dev/null && time mpu 'say vecsum(Math::Prime::Util::GMP::todigits(factorial(10**$_),100)) for 1..6'
153
3663
57420
818136
10658934
131474079

real    0m4.426s
user    0m4.299s
sys 0m0.087s

using the vecsum(todigits(...)) method previously took over 34 minutes (ouch!).

For summation it really looks like I should add another argument to sumdigits with the output base. Right now you really have to use vecsum for that, which isn't terrible in the new code, but certainly not optimal.

Let me know about any more ideas. You have the best suggestions!

danaj / Math-Prime-Util-GMP

[proposal] Subquadratic algorithm for todigits #13