aleaxit / gmpy

General Multi-Precision arithmetic for Python 2.6+/3+ (GMP, MPIR, MPFR, MPC)
https://gmpy2.readthedocs.io/en/latest/
GNU Lesser General Public License v3.0
510 stars 86 forks source link

gmpy2 Sieve Example Code Problem #492

Open Landryj opened 1 month ago

Landryj commented 1 month ago

When I run the Sieve of Eratosthenes example code (Python 3.11.5, Win 11 Pro, VS Code editor, RAM-32GB), it fails at around 4.5*10^9. The error message is: "gmp: overflow in mpz type". I've also run the code with WSL on a machine with nearly 200GB RAM. Previously, I've had gotten more extensive messages previously. image

skirpichev commented 1 month ago

Could you provide your code? Which gmpy2 version you are using?

Please note, that 4.5*10**9 is not an integer literal: that explains previous traceback. I.e. when you are using your limit variable - it has a wrong type and you got a type error.

Landryj commented 1 month ago

I am running gmpy2==2.2.0 using the code below. When I run it, I get no error and no output!

`import time import gmpy2

def sieve(limit=5000000000): '''Returns a generator that yields the prime numbers up to limit.'''

# Increment by 1 to account for the fact that slices  do not include
# the last index value but we do want to include the last value for
# calculating a list of primes.
sieve_limit = gmpy2.isqrt(limit) + 1
limit += 1

# Mark bit positions 0 and 1 as not prime.
bitmap = gmpy2.xmpz(3)

# Process 2 separately. This allows us to use p+p for the step size
# when sieving the remaining primes.
bitmap[4 : limit : 2] = -1
print(len(bitmap))

# Sieve the remaining primes.
for p in bitmap.iter_clear(3, sieve_limit):
    bitmap[p*p : limit : p+p] = -1

return bitmap.iter_clear(2, limit)

if name == "main": start = time.time() result = list(sieve()) print(time.time() - start) print(len(result)) `

skirpichev commented 1 month ago

This is not, apparently, the code you are trying to run in the issue description. This is just an example from docs with an insane limit value (could you think about memory requirements for such a bitmap?!) and some debugging statements. I see no issue here. Just decrease limit value to fit available memory of some computer on Earth.

Could you, please, show us code which you run before and where you got different error messages?

casevh commented 1 month ago

The fundamental issue is that we are over-flowing the limits of a C long which is always 32 bit for a Windows application. It works fine under WSL which has a 64 bit C long.

The GMP library just does a core dump when it detects an overflow. gmpy2 v2.1.5 links to a C runtime library that outputs the error message gmp: overflow in mpz type. The GMP library used for v2.2.0 doesn't output the error message.

limit is the number of bits in an mpz type. I'll do more research on the actual limits on different platforms.

casevh commented 1 month ago

I have found some other strange behaviors. I will continue to investigate.

Landryj commented 1 month ago

Thanks for investigating. I've gotten various error messages on different machines and on Windows or WSL.

skirpichev commented 1 month ago

The GMP library used for v2.2.0 doesn't output the error message.

It does.

$ ulimit -v $((1024*512))
$ python
Python 3.11.3+ (heads/3.11:9fbb614c4e, Apr 29 2023, 14:18:05) [GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import gmpy2
>>> bitmap = gmpy2.xmpz(3)
>>> bitmap[3:5000000000:2]=-1
GNU MP: Cannot reallocate memory (old_size=8 new_size=625000000)
Aborted

Which is a bit strange, that OP claims he have seen "gmp: overflow in mpz type" (same type error for earlier versions, see #280) and that gmpy2 version is 2.2.

casevh commented 1 month ago

I should be more specific. This is a native Windows issue.

The GMP library used in v2.2.0 for x64 native Windows applications is linked against UCRT. It does not display the error message. The GMP library used in v2.1.5 for x64 native Windows applications is linked against MSVCRT. It does display the error message. The GMP library used on Linux displays the expected error message.

I haven't researched why, but I'm guessing that the GMP error message is sent to stderr and UCRT handles it differently.

In either case, GMP is aborting. We get an error message in most environments but not all. It is a red herring for the real issue - how can we make an x64 native application have the same limits as (for example) an x64 Linux application.

skirpichev commented 1 month ago

It does display the error message.

Nothing at all, i.e. no something like "Aborted" above (this is coming from libc)?

Just in case, are you running Python interpreter from CLI or from some IDE like VS? If later, stderr might be redirected somehow.

Landryj commented 1 month ago

For whatever it's worth, I run large primes using the 'bitstring' module. The Sieve code example can be found at [https://github.com/scott-griffiths/bitstring/blob/main/doc/walkthrough.ipynb]. I modified the code, as shown below, to run on my SurfacePro6 with 8GB RAM. I set the limit is 10,000,000,000. It ran in about 120 seconds.

`#### Landry Note: source is - https://github.com/scott-griffiths/bitstring/blob/main/doc/walkthrough.ipynb

Added 'import bitstring' and 'import math'

Added code for timing

from bitstring import BitArray import math import time

Code_Start_Time = time.time()

Scott Griffiths code below

His code example has 'limit = 100_000_000'

I have increasted it to: 10_000_000_000

Create a BitArray with a hundred million 'one' bits

limit = 10_000_000_000 is_prime = BitArray(limit) is_prime.set(True)

Manually set 0 and 1 to be not prime.

is_prime.set(False, [0, 1])

For every other integer, if it's set as prime then unset all of its multiples

for i in range(2, math.ceil(math.sqrt(limit))): if is_prime[i]: is_prime.set(False, range(i*i, limit, i))

My timing code

Total_Execution_Time = time.time() - Code_Start_Time print('Execution time:', Total_Execution_Time)

end my timing code

print(f"There are {is_prime.count(True)} primes less than {limit},") print(f"the largest one of which is {is_prime.rfind('0b1')[0]}") print(f"and there are {len(list(is_prime.findall('0b101')))} twin primes.")`