Open emmorais opened 2 months ago
This ticket is based on some questions about integer size, overflow, and rounding during key generation: https://github.com/Inversed-Tech/eyelid/pull/69#discussion_r1582429678
I know in some applications, rounding, lower order bits, and rare edge cases can have security implications. So I wanted to double-check the security requirements here.
Since I last looked into secure sampling for differential privacy, the Mirnov attack has been extended to Gaussian distributions: https://arxiv.org/pdf/2107.10138
Here are some potential issues:
f64
has 53 bits of integer precision, so it can’t represent integers greater than 2^53
exactly.
We might need to choose a specific sampling implementation. For example, there are integer-only implementations that avoid some of the floating-point security issues.
f64
has 53 bits of integer precision, so it can’t represent integers greater than2^53
exactly.
The normal distribution scaling is implemented by multiplying a standard normal distribution by the standard deviation.
This creates a security constraint which is roughly:
secure_integer_bits > min(53, stdnormal_secure_integer_bits) - ceil(stddev_bits) - 1
The Ziggurat paper hopefully has some analysis of stdnormal_secure_integer_bits
:
https://www.doornik.com/research/ziggurat.pdf
Very roughly, the implementation discards:
[2, 4)
to [-1, 1)
, because the resolution within the [2, 4)
exponent is 2^51
The zero case is slightly different (probability 1 in 2^8
):
ln()
(check this)ziggurat_tables::ZIG_NORM_R
.In particular, if the scaled logarithm of the random value is much less than the constant, only the low bits will be random. (check the probability of this)
I have a thought about the method we're using here that maybe we can discuss: for this application we need a single relatively simple sampling operation, which is to generate one of a small number of small integers from a fixed concrete distribution. Might it be simpler and easier to validate an implementation which just generates a uniform u128
and translates it to an integer from our distribution directly using pre-computed cut-offs? This would avoid all of these subtleties associated with library implementations and floating point conversions happening inline, especially since we don't have any need for the floating point values in the first place.
Basic idea would be to compute the probabilities of those integers which occur in our distribution with probability greater than 1/2^129
, and compute in Sage or another computer algebra system a best approximation of the real distribution over these values using probabilities of the form k/2^128
. (Higher resolution could be accomplished by sampling additional u128
values to represent the probability masses in question.) Then a given random u128
value could be translated to an appropriate integer sample using just a simple table lookup.
For reference, here are the docs for arbitrary precision floating point values in Sage, which I expect would be suitable for computing high-precision cut-off values for a chosen discrete Gaussian distribution that could then be used directly in Rust code.
Here is the original Mirnov paper, which has some good graphs which explain how naive sampling implementations fail: https://www.microsoft.com/en-us/research/wp-content/uploads/2012/10/lsbs.pdf
The implementation we're using is the Ziggurat method, which is much better. (As long as it's implemented correctly.)
An integer weighted sample could be more reliable and more performant, but requires pre-calculation of weights for the distribution. This can be implemented easily using a list of weights, then mapping each weight to its value (and possibly applying a random sign): https://docs.rs/rand/latest/rand/distributions/struct.WeightedIndex.html
Basic idea would be to compute the probabilities of those integers which occur in our distribution with probability greater than
1/2^129
, and compute in Sage or another computer algebra system a best approximation of the real distribution over these values using probabilities of the formk/2^128
. (Higher resolution could be accomplished by sampling additionalu128
values to represent the probability masses in question.) Then a given randomu128
value could be translated to an appropriate integer sample using just a simple table lookup.
This is effectively what the Ziggurat method we're using does (except this implementation is in floating point): https://docs.rs/rand_distr/latest/rand_distr/struct.StandardNormal.html# https://www.doornik.com/research/ziggurat.pdf
Lattices errors and keys must be generated using appropriate distributions.
This is a required task before delivering the milestone.