AtropineTears / num-primes

A Rust Library For Generating Large Composite, Prime, and Safe Prime Numbers
Apache License 2.0
6 stars 7 forks source link

Prime number verification issue. #1

Open Superhepper opened 3 years ago

Superhepper commented 3 years ago

When I wrote a little program to check out how this crate works and I found something strange. Now I am not an expert in prime numbers so I might have done something wrong.

use num_primes::*;

fn main() {
    let numbers = [
        Generator::new_prime(512),
        Generator::new_prime(32),
        Generator::new_prime(16),
        17957u32.into(),
        5u32.into(),
    ];

    for number in numbers {
        if Verification::is_prime(&number) {
            println!("{} is a prime number", number);
        } else {
            println!("{} is not a prime number", number);
        }
    }
}

And the result I got was:

7143985141390067685062171960235716944255257884955901307961617824283167357210788588888340513160296160835326595412561271828611378926148900988701377840767267 is a prime number
2161366811 is a prime number
49331 is a prime number
17957 is not a prime number
5 is not a prime number

I expected both 17957 and 5 to be prime numbers. So I tried to generate a 8 bit prime number but then the program hang. I then tried to find the limit of the minimum number of bits for which a prime number could be generated and found that Generator::new_prime(15) worked fine but Generator::new_prime(14) made the program hang (probably stuck in an infinite loop some where).

AtropineTears commented 3 years ago

This crate uses Num-Bigint, meaning the types are BigUint. Whats most likely happening is that the 17957 and 5 are the wrong type.

You can also use the is_composite() to check if a number is composite (so is not prime).

Feel free to use wolfram alpha to double check primality.

Let me know if this helps!

Superhepper commented 3 years ago

How could they be the wrong type? They are turned into BigUint using into().

And it did not explain why the program hangs when you try to generate a 14 bit prime number.

Superhepper commented 3 years ago

My guess it is something wrong when it comes to how small prime numbers are identified.

AtropineTears commented 3 years ago

Yes, you are right. I looked into it more and I think I fixed the hang issue when generating 14 bit primes. I still have not been able to fix the identification issue with small primes. I am actually glad you found this as it may mean I can improve performance on getting prime numbers to be generated.

Solution For Hang

https://github.com/AtropineTearz/num-primes/blob/105da2d4485def483b943ded4e5c103914be20b4/src/lib.rs#L290

The solution I used to fix the hang is that I added dividing by the number itself.

AtropineTears commented 3 years ago

Looks like the problem is in the miller-rabin primality test. I am looking into it some more.

Superhepper commented 3 years ago

I wonder if it would be faster to have all the small prime numbers in a hash set instead of an array. Then the first thing you could do is this:

if small_prime_hash_set.contains(p) {
    return true;
}
vdeurzen commented 3 years ago

A quick look at the code makes me think the issue with identifying small primes is in div_small_primes(numb: &BigUint). That function checks if numb % p == 0 however, it does not check for numb == p, so it incorrectly labels the small prime as non-prime because the remainder is 0.

vdeurzen commented 3 years ago

Yes, you are right. I looked into it more and I think I fixed the hang issue when generating 14 bit primes. I still have not been able to fix the identification issue with small primes. I am actually glad you found this as it may mean I can improve performance on getting prime numbers to be generated.

Solution For Hang

https://github.com/AtropineTearz/num-primes/blob/105da2d4485def483b943ded4e5c103914be20b4/src/lib.rs#L290

The solution I used to fix the hang is that I added dividing by the number itself.

Note that this change still causes the small number to first be found 'non-prime' because its remainder is 0, the second check will never run for numb when it is one of the small primes being tested. I also wonder if division is the best (most efficient) way to check this or if it's possible to use == here? I can imagine some cases where that may not work, but I'm not sure those are relevant here.

AtropineTears commented 3 years ago

The problem is not with the div_small_primes(). It is with the miller-rabin implementation (i tested it out and I can verify this).

I am not certain that the problem is only with small prime numbers. It may also be with larger prime numbers. Since the library is mainly for generation, there is not really a way to tell besides running tests with larger prime numbers. It seems that if the issue was resolved, it may speed up generation as well (not certain of this though).

I cannot find the issue with it. Any help would be greatly appreciated. Pull Requests are welcome.

vdeurzen commented 3 years ago

I just read the new comment above the function and see that the return value does match the expectations for div_small_primes(). The test for bug1 is still missing an assert!(...) though, which would be good to add IMHO. In any case, sorry for the misinterpretation.

AtropineTears commented 3 years ago

@vdeurzen Any luck with fixing anything with miller-rabin's implementation? I cannot seem to fix it. It seems that the problem may be making it mark some probable primes as composite which would reduce the speed by a lot. I am trying to improve performance so it matches that of OpenSSL's prime number generation.

What may help is looking at different miller-rabin implementations. For example, here are some resources I have been using:

AtropineTears commented 3 years ago

I have spent two hours debugging it to see what the problem is. Here is what I have learned so far:

  1. It does label some small prime numbers as prime. You can test this by passing in 79 for the prime and 17957 for the composite (although it is actually a prime number)

  2. The num library, unlike RAMP, does not allow you to set individual bits. This causes a decrease in performance as you can not set a value to odd by simply changing a single bit.

Here is what I have done:

I would greatly appreciate it if you looked at the code (including anyone else who reads this)

mb720 commented 2 years ago

I'd like to provide a data point where a large prime number is not recognized as such by num-primes.

The following prime number was generated with ssh-keygen and is not prime according to num-primes:

169511182982703321453314585423962898651587669459838234386506572286328885534468792292646838949809616446341407457141008401355628947670484184607678853094537849610289912805960069455687743151708433319901176932959509872662610091644590437761688516626993416011399330087939042347256922771590903190536793274742859624657

The Rust code:

use num_bigint::BigUint;
use num_primes::Verification;

fn main() {

  let prime_as_big_uint = BigUint::parse_bytes(b"169511182982703321453314585423962898651587669459838234386506572286328885534468792292646838949809616446341407457141008401355628947670484184607678853094537849610289912805960069455687743151708433319901176932959509872662610091644590437761688516626993416011399330087939042347256922771590903190536793274742859624657", 10).unwrap();
  let is_prime: bool = Verification::is_prime(&prime_as_big_uint);

  // Says the number is not prime
  println!("It's prime: {}", is_prime);
}

Inside Cargo.toml:

[dependencies]
num-bigint = "0.2.6"
num-primes = "0.2.0"

On the other hand, I also checked the number with the Python library primality, which I installed with pip install primality:

from primality import primality
primality.isprime(169511182982703321453314585423962898651587669459838234386506572286328885534468792292646838949809616446341407457141008401355628947670484184607678853094537849610289912805960069455687743151708433319901176932959509872662610091644590437761688516626993416011399330087939042347256922771590903190536793274742859624657)

Which yields True.

AtropineTears commented 2 years ago

Thank you so much for adding a data point. The conclusion I get from this is that only a certain set of primes are passing the primality test.

I have looked through the code and cannot find the issue. Any help would be greatly appreciated.

If this bug is fixed, my guess is prime number generation can be on par with OpenSSL's speed.

AtropineTears commented 2 years ago

Okay can everyone verify that it is working correctly so I can close this. Thank you all for your help.

The primality generation is still slow and not as fast as OpenSSL's generation of primes so if anyone knows anyway to speed things up it would be much appreciated.

evrhines commented 2 years ago

I am working on running a flamegraph but it doesn't seem to like to play nice with WSL. With that, I did do some timestamp testing and I suspect that the bottle next is inside the bigint library itself. I suspect https://github.com/rust-num/num-bigint/issues/169 would be a big help.

Outside of the existing big-int library, there are some faster ones but it would require a decent amount of work to do the conversion. iBig looks like a good fit if we do decide to go that route.

We probably should make another issues to track the performance.

https://crates.io/crates/bigint-benchmark

I was able to get about ~15% speed improvement by moving some of the .to_usize().unwrap() executions outside of the loop so I can get a PR for that in the meantime. I am still doing some testing to make sure its not just randomness causing the differences.

Edit: It also looks like bumping all the rand and num dependencies has a massive impact on the performance. There must be some new optimizations.

Still nowhere near openssl but I believe all of the runtime fat is inside the random number generation and exponentiation of the bit ints.

I placed the 2048 bit results below but I tried it with 256, 512, and 2048 bit primes and, with equal iters, the rust version is about 4x slower across the board. Note that openssl always uses 64 iters so keep that in mind when comparing with the rust version's variable iters.

Before: 100 runs (2048 bit prime, 8 iters)

num-primes per run average: 2.09713327s
openssl per run average: 255.9ms

After: 100 runs (2048 bit prime, 8 iters)

num-primes per run average: 670.486754ms
openssl per run average: 255.9ms

After: 100 runs(2048 bit prime, 64 iters)

num-primes per run average: 966.052487ms
openssl per run average: 255.9ms
AtropineTears commented 2 years ago

I also have ramp-primes which uses RAMP instead of num. However, it currently is broken. It does seem to run faster than num too because it uses assembly.

I am really interested in fixing this project up but have been busy lately. Feel free to edit the code some more.

JASory commented 2 years ago

Has this been fixed? Otherwise I can pull it and fix it.

AtropineTears commented 2 years ago

Has this been fixed? Otherwise I can pull it and fix it.

Somewhat. I believe it does rightly mark primes but you can try it yourself. Another thing that needs to be worked on is the speed for generating primes. OpenSSL is super fast at this.

num-primes is much slower.

JASory commented 2 years ago

Has this been fixed? Otherwise I can pull it and fix it.

Somewhat. I believe it does rightly mark primes but you can try it yourself. Another thing that needs to be worked on is the speed for generating primes. OpenSSL is super fast at this.

num-primes is much slower.

num-bigint is absymal performance. You can also check divisibility a bit faster than trying to use divide on Bigint but that would require writing your own arbitrary-precision library.

Mietzsch commented 2 years ago

I found this thread when I tried to verify that some small numbers were prime and got unexpected results and/or my program panicked. For example, if I want to verify that 3 is a prime, I got false (is not a prime) or a panic in the Miller-Rabin-Test.

From what I can see, the if numb / &BigUint::from(*p) == one { line in div_small_primes is part of the problem. When run with numb = 3 and p = 2 this evalutes to 1 and thus div_small_primes returns true which in turn is_prime interprets as "all good, go on". This should be a problem for a lot of small valid primes, not only three.

Note that the comment // if true, then is not prime // if false, then maybe prime above div_small_primes seems to be wrong. At least is_prime uses the values in the opposite semantics.

So this causes the input 3 to pass the initial check in is_prime. Then it gets passed into the fermat check. Because the input is so small, we have a real chance that let random: BigUint = rng.gen_biguint_below(candidate); evalutes to zero, which lets the fermat check fail and thus 3 gets evaluated to not being prime. This is only a problem for very small primes, otherwise the chances of random being zero are too small to matter.

If the fermat check passes, then the input three gets passed to the miller_rabin check which fails because of an assert when let a = rng.gen_biguint_range(&two, &(candidate-&one)); is called with a range between 2 and 2.

But the last part is only a problem if the input is three, which it never should be if the div_small_primes check was working problery. I believe that the first two bugs are part of the problem why some valid primes get rejected.

JASory commented 2 years ago

Use @cmpute 's num-prime or my number-theory (ENT), they are much faster and verified to a extremely high probability of accuracy. (number-theory is weighted to a minimum of 1/2^64 failure rate (primarily the rare fermat number), haven't verified cmpute's implementation)