Inversed-Tech / eyelid

Private iris matching
Apache License 2.0
0 stars 0 forks source link

Review security of lattice distribution implementation #70

Open emmorais opened 2 months ago

emmorais commented 2 months ago

Lattices errors and keys must be generated using appropriate distributions.

This is a required task before delivering the milestone.

teor2345 commented 2 months ago

This ticket is based on some questions about integer size, overflow, and rounding during key generation: https://github.com/Inversed-Tech/eyelid/pull/69#discussion_r1582429678

I know in some applications, rounding, lower order bits, and rare edge cases can have security implications. So I wanted to double-check the security requirements here.

Since I last looked into secure sampling for differential privacy, the Mirnov attack has been extended to Gaussian distributions: https://arxiv.org/pdf/2107.10138

Here are some potential issues:

f64 has 53 bits of integer precision, so it can’t represent integers greater than 2^53 exactly.

We might need to choose a specific sampling implementation. For example, there are integer-only implementations that avoid some of the floating-point security issues.

teor2345 commented 2 months ago

f64 has 53 bits of integer precision, so it can’t represent integers greater than 2^53 exactly.

The normal distribution scaling is implemented by multiplying a standard normal distribution by the standard deviation.

This creates a security constraint which is roughly:

secure_integer_bits > min(53, stdnormal_secure_integer_bits) - ceil(stddev_bits) - 1

The Ziggurat paper hopefully has some analysis of stdnormal_secure_integer_bits: https://www.doornik.com/research/ziggurat.pdf

teor2345 commented 2 months ago

Very roughly, the implementation discards:

The zero case is slightly different (probability 1 in 2^8):

In particular, if the scaled logarithm of the random value is much less than the constant, only the low bits will be random. (check the probability of this)

bgillesp commented 2 months ago

I have a thought about the method we're using here that maybe we can discuss: for this application we need a single relatively simple sampling operation, which is to generate one of a small number of small integers from a fixed concrete distribution. Might it be simpler and easier to validate an implementation which just generates a uniform u128 and translates it to an integer from our distribution directly using pre-computed cut-offs? This would avoid all of these subtleties associated with library implementations and floating point conversions happening inline, especially since we don't have any need for the floating point values in the first place.

Basic idea would be to compute the probabilities of those integers which occur in our distribution with probability greater than 1/2^129, and compute in Sage or another computer algebra system a best approximation of the real distribution over these values using probabilities of the form k/2^128. (Higher resolution could be accomplished by sampling additional u128 values to represent the probability masses in question.) Then a given random u128 value could be translated to an appropriate integer sample using just a simple table lookup.

bgillesp commented 2 months ago

For reference, here are the docs for arbitrary precision floating point values in Sage, which I expect would be suitable for computing high-precision cut-off values for a chosen discrete Gaussian distribution that could then be used directly in Rust code.

teor2345 commented 2 months ago

Here is the original Mirnov paper, which has some good graphs which explain how naive sampling implementations fail: https://www.microsoft.com/en-us/research/wp-content/uploads/2012/10/lsbs.pdf

The implementation we're using is the Ziggurat method, which is much better. (As long as it's implemented correctly.)

teor2345 commented 2 months ago

An integer weighted sample could be more reliable and more performant, but requires pre-calculation of weights for the distribution. This can be implemented easily using a list of weights, then mapping each weight to its value (and possibly applying a random sign): https://docs.rs/rand/latest/rand/distributions/struct.WeightedIndex.html

teor2345 commented 2 months ago

Basic idea would be to compute the probabilities of those integers which occur in our distribution with probability greater than 1/2^129, and compute in Sage or another computer algebra system a best approximation of the real distribution over these values using probabilities of the form k/2^128. (Higher resolution could be accomplished by sampling additional u128 values to represent the probability masses in question.) Then a given random u128 value could be translated to an appropriate integer sample using just a simple table lookup.

This is effectively what the Ziggurat method we're using does (except this implementation is in floating point): https://docs.rs/rand_distr/latest/rand_distr/struct.StandardNormal.html# https://www.doornik.com/research/ziggurat.pdf