Open fjarri opened 1 year ago
Some investigation results of the algorithm from the paper above (Fig. 2). The steps 1-7 (finding b
that's coprime to a product of small primes m
) are quite fast, usually b
is found in 1-2 iterations. But since m
is combined out of a smaller number of primes we use in the sieve, the iteration in the steps 8-11 produces more composites compared to Sieve
. For example, for U1024
, and \ell = 128
, m
is composed out of ~120 primes, while Sieve
uses 2047 primes. This makes finding a prime about two times slower.
So it doesn't seem like we can just replace the existing sieve, but the method does have value when an unbiased generation is required. Notes:
b
if we're looking for a safe prime? Can we adjust the iteration so that the resulting 2b + 1
(or b/2
) is also coprime to m
? Otherwise the difference with regular sieving will be huge.m
, lambda(m)
and so on for each Uint
size we plan to use. Probably put it in a trait. b
, m
must be odd, so b
may be even. In the steps 8-11 we can adjust the generated a
so that a * m + b
is always odd (or = 3 mod 4
for safe primes). DynResidue
" method, so that we don't have to convert the generated randoms into Montgomery in the step 4.I can note my explicit interest in this for use in a hash-to-prime function. I'd also be curious to benchmark it when it's not using a distinctly sampled random numbers, yet only generating a single new byte/chunk and shifting over the generated bits (questioning how much of the cost is due to generating KB-MB of random numbers).
I didn't measure it, but it feels like the contribution of RNG is quite minimal.
How dangerous is it, actually? Any sources?
I think we are safe from this, at least as far as the presets
go. Here's why: the dangerous bias enters the game when the same sieve is used many times with the same sequence.
Our sieve starts with a random odd integer k
and generates a sequence k, k+2, k+4, …, k+200, … k+2n
. Let's say we find a prime with candidate k+200
and then continue searching for more primes in the same sequence of candidates, then the primes found will be biased towards the specific value of k
(for instance if k is congruent to 1 mod 3, then all primes from that sequence of candidates will be congruent to 1 mod 3).
In our case however, we exit the loop once a prime is found and discard the Sieve
; if the user needs a second prime, they have to generate a new random odd k
and this will result in a sequence of candidates equally likely to be congruent to 1 mod 3 as to 2 mod 3.
For use cases where the Sieve
is used to pre-sieve a large number of candidates and then "here's a bunch of ints, find me all primes therein", then those primes could display a bias. As far as I can tell, how dangerous such a bias is really depends on what the user do with the primes.
A way to get an unbiased sampling of safe primes given an unbiased sampling of regular primes: https://www.degruyter.com/document/doi/10.1515/jmc-2013-5011/pdf.
Currently random prime generation starts from a random number and runs a sieve until a prime is found. This can introduce bias, selecting primes with large leads more often. Some assorted considerations:
a < 2^(k-1)
, and generate candidates as2^(k-1) + (a + i * b mod 2^(k-1))
whereb
is a random odd number. This will uniformly cover all the range[2^(k-1), 2^k)
(right?) May be a little faster but not too much.