entropyxyz / crypto-primes

Random prime generation and primality testing library based on `crypto-bigint`.
https://docs.rs/crypto-primes
Apache License 2.0
19 stars 5 forks source link

Unbiased random prime generation #23

Open fjarri opened 1 year ago

fjarri commented 1 year ago

Currently random prime generation starts from a random number and runs a sieve until a prime is found. This can introduce bias, selecting primes with large leads more often. Some assorted considerations:

fjarri commented 1 year ago

Some investigation results of the algorithm from the paper above (Fig. 2). The steps 1-7 (finding b that's coprime to a product of small primes m) are quite fast, usually b is found in 1-2 iterations. But since m is combined out of a smaller number of primes we use in the sieve, the iteration in the steps 8-11 produces more composites compared to Sieve. For example, for U1024, and \ell = 128, m is composed out of ~120 primes, while Sieve uses 2047 primes. This makes finding a prime about two times slower.

So it doesn't seem like we can just replace the existing sieve, but the method does have value when an unbiased generation is required. Notes:

kayabaNerve commented 11 months ago

I can note my explicit interest in this for use in a hash-to-prime function. I'd also be curious to benchmark it when it's not using a distinctly sampled random numbers, yet only generating a single new byte/chunk and shifting over the generated bits (questioning how much of the cost is due to generating KB-MB of random numbers).

fjarri commented 11 months ago

I didn't measure it, but it feels like the contribution of RNG is quite minimal.

dvdplm commented 3 weeks ago

How dangerous is it, actually? Any sources?

I think we are safe from this, at least as far as the presets go. Here's why: the dangerous bias enters the game when the same sieve is used many times with the same sequence.

Our sieve starts with a random odd integer k and generates a sequence k, k+2, k+4, …, k+200, … k+2n. Let's say we find a prime with candidate k+200 and then continue searching for more primes in the same sequence of candidates, then the primes found will be biased towards the specific value of k (for instance if k is congruent to 1 mod 3, then all primes from that sequence of candidates will be congruent to 1 mod 3).

In our case however, we exit the loop once a prime is found and discard the Sieve; if the user needs a second prime, they have to generate a new random odd k and this will result in a sequence of candidates equally likely to be congruent to 1 mod 3 as to 2 mod 3.

For use cases where the Sieve is used to pre-sieve a large number of candidates and then "here's a bunch of ints, find me all primes therein", then those primes could display a bias. As far as I can tell, how dangerous such a bias is really depends on what the user do with the primes.

fjarri commented 2 weeks ago

A way to get an unbiased sampling of safe primes given an unbiased sampling of regular primes: https://www.degruyter.com/document/doi/10.1515/jmc-2013-5011/pdf.