Unbiased random prime generation

fjarri commented 1 year ago

Currently random prime generation starts from a random number and runs a sieve until a prime is found. This can introduce bias, selecting primes with large leads more often. Some assorted considerations:

How dangerous is it, actually? Any sources?
GMP re-samples the start position every 0x10000 samples, quoting "deep science" (with no further explanations). OpenSSL doesn't re-sample.
Just sampling random numbers each time and running BPSW on them makes the generation several times slower.
To avoid touching the RNG many times, we can start with a random a < 2^(k-1), and generate candidates as 2^(k-1) + (a + i * b mod 2^(k-1)) where b is a random odd number. This will uniformly cover all the range [2^(k-1), 2^k) (right?) May be a little faster but not too much.
See https://eprint.iacr.org/2011/481.pdf ("Close to Uniform Prime Number Generation With Fewer Random Bits") for a more advanced algorithm.

fjarri commented 1 year ago

Some investigation results of the algorithm from the paper above (Fig. 2). The steps 1-7 (finding b that's coprime to a product of small primes m) are quite fast, usually b is found in 1-2 iterations. But since m is combined out of a smaller number of primes we use in the sieve, the iteration in the steps 8-11 produces more composites compared to Sieve. For example, for U1024, and \ell = 128, m is composed out of ~120 primes, while Sieve uses 2047 primes. This makes finding a prime about two times slower.

So it doesn't seem like we can just replace the existing sieve, but the method does have value when an unbiased generation is required. Notes:

How do we select b if we're looking for a safe prime? Can we adjust the iteration so that the resulting 2b + 1 (or b/2) is also coprime to m? Otherwise the difference with regular sieving will be huge.
We will have to pre-compute m, lambda(m) and so on for each Uint size we plan to use. Probably put it in a trait.
Note that since we're using Montgomery representation to find b, m must be odd, so b may be even. In the steps 8-11 we can adjust the generated a so that a * m + b is always odd (or = 3 mod 4 for safe primes).
It would be useful to have a "random nonzero DynResidue" method, so that we don't have to convert the generated randoms into Montgomery in the step 4.

kayabaNerve commented 11 months ago

I can note my explicit interest in this for use in a hash-to-prime function. I'd also be curious to benchmark it when it's not using a distinctly sampled random numbers, yet only generating a single new byte/chunk and shifting over the generated bits (questioning how much of the cost is due to generating KB-MB of random numbers).

fjarri commented 11 months ago

I didn't measure it, but it feels like the contribution of RNG is quite minimal.

dvdplm commented 3 weeks ago

How dangerous is it, actually? Any sources?

I think we are safe from this, at least as far as the presets go. Here's why: the dangerous bias enters the game when the same sieve is used many times with the same sequence.

Our sieve starts with a random odd integer k and generates a sequence k, k+2, k+4, …, k+200, … k+2n. Let's say we find a prime with candidate k+200 and then continue searching for more primes in the same sequence of candidates, then the primes found will be biased towards the specific value of k (for instance if k is congruent to 1 mod 3, then all primes from that sequence of candidates will be congruent to 1 mod 3).

In our case however, we exit the loop once a prime is found and discard the Sieve; if the user needs a second prime, they have to generate a new random odd k and this will result in a sequence of candidates equally likely to be congruent to 1 mod 3 as to 2 mod 3.

For use cases where the Sieve is used to pre-sieve a large number of candidates and then "here's a bunch of ints, find me all primes therein", then those primes could display a bias. As far as I can tell, how dangerous such a bias is really depends on what the user do with the primes.

fjarri commented 2 weeks ago

A way to get an unbiased sampling of safe primes given an unbiased sampling of regular primes: https://www.degruyter.com/document/doi/10.1515/jmc-2013-5011/pdf.

entropyxyz / crypto-primes

Unbiased random prime generation #23