Open Superhepper opened 3 years ago
This crate uses Num-Bigint, meaning the types are BigUint
. Whats most likely happening is that the 17957 and 5 are the wrong type.
You can also use the is_composite()
to check if a number is composite (so is not prime).
Feel free to use wolfram alpha to double check primality.
Let me know if this helps!
How could they be the wrong type? They are turned into BigUint using into().
And it did not explain why the program hangs when you try to generate a 14 bit prime number.
My guess it is something wrong when it comes to how small prime numbers are identified.
Yes, you are right. I looked into it more and I think I fixed the hang issue when generating 14 bit primes. I still have not been able to fix the identification issue with small primes. I am actually glad you found this as it may mean I can improve performance on getting prime numbers to be generated.
The solution I used to fix the hang is that I added dividing by the number itself.
Looks like the problem is in the miller-rabin primality test. I am looking into it some more.
I wonder if it would be faster to have all the small prime numbers in a hash set instead of an array. Then the first thing you could do is this:
if small_prime_hash_set.contains(p) {
return true;
}
A quick look at the code makes me think the issue with identifying small primes is in div_small_primes(numb: &BigUint)
. That function checks if numb % p == 0
however, it does not check for numb == p
, so it incorrectly labels the small prime as non-prime because the remainder is 0.
Yes, you are right. I looked into it more and I think I fixed the hang issue when generating 14 bit primes. I still have not been able to fix the identification issue with small primes. I am actually glad you found this as it may mean I can improve performance on getting prime numbers to be generated.
Solution For Hang
The solution I used to fix the hang is that I added dividing by the number itself.
Note that this change still causes the small number to first be found 'non-prime' because its remainder is 0, the second check will never run for numb
when it is one of the small primes being tested. I also wonder if division is the best (most efficient) way to check this or if it's possible to use ==
here? I can imagine some cases where that may not work, but I'm not sure those are relevant here.
The problem is not with the div_small_primes()
. It is with the miller-rabin implementation (i tested it out and I can verify this).
I am not certain that the problem is only with small prime numbers. It may also be with larger prime numbers. Since the library is mainly for generation, there is not really a way to tell besides running tests with larger prime numbers. It seems that if the issue was resolved, it may speed up generation as well (not certain of this though).
I cannot find the issue with it. Any help would be greatly appreciated. Pull Requests are welcome.
I just read the new comment above the function and see that the return value does match the expectations for div_small_primes()
. The test for bug1 is still missing an assert!(...)
though, which would be good to add IMHO. In any case, sorry for the misinterpretation.
@vdeurzen Any luck with fixing anything with miller-rabin's implementation? I cannot seem to fix it. It seems that the problem may be making it mark some probable primes as composite which would reduce the speed by a lot. I am trying to improve performance so it matches that of OpenSSL's prime number generation.
What may help is looking at different miller-rabin implementations. For example, here are some resources I have been using:
I have spent two hours debugging it to see what the problem is. Here is what I have learned so far:
It does label some small prime numbers as prime. You can test this by passing in 79
for the prime and 17957
for the composite (although it is actually a prime number)
The num library, unlike RAMP, does not allow you to set individual bits. This causes a decrease in performance as you can not set a value to odd by simply changing a single bit.
Here is what I have done:
Refactored code and renamed a few variables
[Miller-Rabin Implementation] Changed 1..step
to 0..step
Added Logging Ability
I would greatly appreciate it if you looked at the code (including anyone else who reads this)
I'd like to provide a data point where a large prime number is not recognized as such by num-primes
.
The following prime number was generated with ssh-keygen
and is not prime according to num-primes
:
169511182982703321453314585423962898651587669459838234386506572286328885534468792292646838949809616446341407457141008401355628947670484184607678853094537849610289912805960069455687743151708433319901176932959509872662610091644590437761688516626993416011399330087939042347256922771590903190536793274742859624657
The Rust code:
use num_bigint::BigUint;
use num_primes::Verification;
fn main() {
let prime_as_big_uint = BigUint::parse_bytes(b"169511182982703321453314585423962898651587669459838234386506572286328885534468792292646838949809616446341407457141008401355628947670484184607678853094537849610289912805960069455687743151708433319901176932959509872662610091644590437761688516626993416011399330087939042347256922771590903190536793274742859624657", 10).unwrap();
let is_prime: bool = Verification::is_prime(&prime_as_big_uint);
// Says the number is not prime
println!("It's prime: {}", is_prime);
}
Inside Cargo.toml:
[dependencies]
num-bigint = "0.2.6"
num-primes = "0.2.0"
On the other hand, I also checked the number with the Python library primality, which I installed with pip install primality
:
from primality import primality
primality.isprime(169511182982703321453314585423962898651587669459838234386506572286328885534468792292646838949809616446341407457141008401355628947670484184607678853094537849610289912805960069455687743151708433319901176932959509872662610091644590437761688516626993416011399330087939042347256922771590903190536793274742859624657)
Which yields True
.
Thank you so much for adding a data point. The conclusion I get from this is that only a certain set of primes are passing the primality test.
I have looked through the code and cannot find the issue. Any help would be greatly appreciated.
If this bug is fixed, my guess is prime number generation can be on par with OpenSSL's speed.
Okay can everyone verify that it is working correctly so I can close this. Thank you all for your help.
The primality generation is still slow and not as fast as OpenSSL's generation of primes so if anyone knows anyway to speed things up it would be much appreciated.
I am working on running a flamegraph but it doesn't seem to like to play nice with WSL. With that, I did do some timestamp testing and I suspect that the bottle next is inside the bigint library itself. I suspect https://github.com/rust-num/num-bigint/issues/169 would be a big help.
Outside of the existing big-int library, there are some faster ones but it would require a decent amount of work to do the conversion. iBig looks like a good fit if we do decide to go that route.
We probably should make another issues to track the performance.
https://crates.io/crates/bigint-benchmark
I was able to get about ~15% speed improvement by moving some of the .to_usize().unwrap() executions outside of the loop so I can get a PR for that in the meantime. I am still doing some testing to make sure its not just randomness causing the differences.
Edit: It also looks like bumping all the rand and num dependencies has a massive impact on the performance. There must be some new optimizations.
Still nowhere near openssl but I believe all of the runtime fat is inside the random number generation and exponentiation of the bit ints.
I placed the 2048 bit results below but I tried it with 256, 512, and 2048 bit primes and, with equal iters, the rust version is about 4x slower across the board. Note that openssl always uses 64 iters so keep that in mind when comparing with the rust version's variable iters.
Before: 100 runs (2048 bit prime, 8 iters)
num-primes per run average: 2.09713327s
openssl per run average: 255.9ms
After: 100 runs (2048 bit prime, 8 iters)
num-primes per run average: 670.486754ms
openssl per run average: 255.9ms
After: 100 runs(2048 bit prime, 64 iters)
num-primes per run average: 966.052487ms
openssl per run average: 255.9ms
I also have ramp-primes which uses RAMP instead of num. However, it currently is broken. It does seem to run faster than num too because it uses assembly.
I am really interested in fixing this project up but have been busy lately. Feel free to edit the code some more.
Has this been fixed? Otherwise I can pull it and fix it.
Has this been fixed? Otherwise I can pull it and fix it.
Somewhat. I believe it does rightly mark primes but you can try it yourself. Another thing that needs to be worked on is the speed for generating primes. OpenSSL is super fast at this.
num-primes is much slower.
Has this been fixed? Otherwise I can pull it and fix it.
Somewhat. I believe it does rightly mark primes but you can try it yourself. Another thing that needs to be worked on is the speed for generating primes. OpenSSL is super fast at this.
num-primes is much slower.
num-bigint is absymal performance. You can also check divisibility a bit faster than trying to use divide on Bigint but that would require writing your own arbitrary-precision library.
I found this thread when I tried to verify that some small numbers were prime and got unexpected results and/or my program panicked. For example, if I want to verify that 3 is a prime, I got false (is not a prime) or a panic in the Miller-Rabin-Test.
From what I can see, the
if numb / &BigUint::from(*p) == one {
line in div_small_primes
is part of the problem. When run with numb = 3
and p = 2
this evalutes to 1 and thus div_small_primes
returns true which in turn is_prime
interprets as "all good, go on". This should be a problem for a lot of small valid primes, not only three.
Note that the comment
// if true, then is not prime // if false, then maybe prime
above div_small_primes
seems to be wrong. At least is_prime
uses the values in the opposite semantics.
So this causes the input 3 to pass the initial check in is_prime
. Then it gets passed into the fermat
check. Because the input is so small, we have a real chance that
let random: BigUint = rng.gen_biguint_below(candidate);
evalutes to zero, which lets the fermat check fail and thus 3 gets evaluated to not being prime. This is only a problem for very small primes, otherwise the chances of random
being zero are too small to matter.
If the fermat
check passes, then the input three gets passed to the miller_rabin
check which fails because of an assert when
let a = rng.gen_biguint_range(&two, &(candidate-&one));
is called with a range between 2 and 2.
But the last part is only a problem if the input is three, which it never should be if the div_small_primes
check was working problery. I believe that the first two bugs are part of the problem why some valid primes get rejected.
Use @cmpute 's num-prime or my number-theory (ENT), they are much faster and verified to a extremely high probability of accuracy. (number-theory is weighted to a minimum of 1/2^64 failure rate (primarily the rare fermat number), haven't verified cmpute's implementation)
When I wrote a little program to check out how this crate works and I found something strange. Now I am not an expert in prime numbers so I might have done something wrong.
And the result I got was:
I expected both 17957 and 5 to be prime numbers. So I tried to generate a 8 bit prime number but then the program hang. I then tried to find the limit of the minimum number of bits for which a prime number could be generated and found that
Generator::new_prime(15)
worked fine butGenerator::new_prime(14)
made the program hang (probably stuck in an infinite loop some where).