aleaxit / gmpy

General Multi-Precision arithmetic for Python 2.6+/3+ (GMP, MPIR, MPFR, MPC)
https://gmpy2.readthedocs.io/en/latest/
GNU Lesser General Public License v3.0
506 stars 84 forks source link

Abnormal situation of converting "mpz" to "str" #489

Closed gpchn closed 1 week ago

gpchn commented 3 weeks ago

Hardware: Windows 10 PC (x86_64)

Python version: 3.12.1

gmpy2 version: 2.2.0

Problem: I am writing a program about calculating pi to n decimal places. It uses the Chudnovsky formula and binary splitting algorithm. Its calculation result is correct and the speed is very fast. Here is the code:

def chudnovsky_binsplit_mpz(digits:int):
    from gmpy2 import mpz, sqrt

    digits *= 2
    def binsplit(a, b):
        if b - a == 1:
            if a == 0:
                Pab = Qab = 1
            else:
                Pab = (6*a-5) * (2*a-1) * (6*a-1)
                Qab = 640320**3//24 * a**3
            Tab = (13591409 + 545140134 * a) * Pab
            if a & 1:
                Tab = -Tab
            return mpz(Pab), mpz(Qab), mpz(Tab)
        else:
            m = (a + b) // 2
            Pam, Qam, Tam = binsplit(a, m)
            Pmb, Qmb, Tmb = binsplit(m, b)
            Pab = Pam * Pmb
            Qab = Qam * Qmb
            Tab = Qmb * Tam + Pam * Tmb
            return Pab, Qab, Tab

    terms = digits // 14 + 1
    P, Q, T = binsplit(0, terms)
    return mpz(Q * 426880 * sqrt(10005 * mpz(10)**digits) // T)

def main():
    n = 1000000
    pi = str(chudnovsky_binsplit_mpz(n))
    print(pi)

if __name__ == "__main__":
    main()

The calculated result is "31415......0995". I am sure it is correct because it is the same as the result before acceleration.

The problem is that when I try to take the last number, it does not match the expected result:

print(pi[-1])
# printed 4, not 5

This is indeed confusing. So I started trying to debug:

def main():
    n = 1000000
    pi = str(chudnovsky_binsplit_mpz(n))
    print(pi)

    print(len(pi))
    # 1000001
    correct_pi = "31415926......0995" # Ctrl-C from terminal
    print(type(pi) == type(correct_pi) == str)
    # true
    print(pi == correct_pi)
    # false
    for i,j in zip(pi, correct_pi):
        if i != j:
            print("!")
    # (no "!" was printed)

So the length is correct, the type is correct, each character is correct, but it's just different, and slicing can also cause problems!

Is this a problem of CPython, gmpy2, or something else?...

gpchn commented 3 weeks ago

Just now, I tried to use math.isort() instead gmpy2. I recalculated and the last few digits became "8778"...

Even worse, when I print(pi[-1]) again, the output is 1...

I hate mathematics

update: I found online that the number 1000000th is actually 1

casevh commented 3 weeks ago

The key issue in your code it the use of sqrt versus isqrt. You've been encountering precision errors with sqrt. The following code should run correctly both with and without gmpy2. You will need to comment out the gmpy2 import line and uncomment the following lines. See comments in code. I have verified calculations to both 1_000_000 and 10_000_000 digits.

def chudnovsky_binsplit_mpz(digits:int):
    # Use just the next line only to use gmpy2.
    from gmpy2 import mpz, isqrt

    # Comment out the previous line and uncomment the following lines
    # to use native Python. Limited to 10_000_000 digits.

    # from math import isqrt
    # import sys
    # sys.set_int_max_str_digits(10_000_100)
    # mpz = int

    digits *= 2
    def binsplit(a, b):
        if b - a == 1:
            if a == 0:
                Pab = Qab = 1
            else:
                Pab = (6*a-5) * (2*a-1) * (6*a-1)
                Qab = 640320**3//24 * a**3
            Tab = (13591409 + 545140134 * a) * Pab
            if a & 1:
                Tab = -Tab
            return mpz(Pab), mpz(Qab), mpz(Tab)
        else:
            m = (a + b) // 2
            Pam, Qam, Tam = binsplit(a, m)
            Pmb, Qmb, Tmb = binsplit(m, b)
            Pab = Pam * Pmb
            Qab = Qam * Qmb
            Tab = Qmb * Tam + Pam * Tmb
            return Pab, Qab, Tab

    terms = digits // 14 + 1
    P, Q, T = binsplit(0, terms)
    # Use 'isqrt' (integer square root) instead of 'sqrt' (floating
    # point square root).
    #
    # Remove redundant 'mpz(...)' call.
    return Q * 426880 * isqrt(10005 * mpz(10)**digits) // T

def main():
    n = 10_000_000
    pi = str(chudnovsky_binsplit_mpz(n))
    # Print exactly 'n' digits.
    print(pi[:n])

if __name__ == "__main__":
    main()
gpchn commented 3 weeks ago

Thank you for response. But when I run your code, the result is incorrect. I tried several times and eventually found that it lost accuracy when n >= 16383 (perhaps only on my computer).

Also, interestingly, if only print(pi[-1]), it is correct(when n = 1000000).

I think this goes far beyond my knowledge of programming. The purpose of this issue is only to remind you if there are any bugs in gmpy2. If I wrote the wrong code, you can close this issue.

In fact, I have given up on continuing to delve into calculating pi or computer processing large numbers, which is too difficult.

casevh commented 1 week ago

Closing.