dpsa4fl / overview

Differential Privacy for Federated Learning with Secure Aggregation
4 stars 0 forks source link

Designing and understanding the privacy properties of our algorithm #24

Open MxmUrw opened 1 year ago

MxmUrw commented 1 year ago

There are two approaches for generating noise:

1. Sample from the "float" gaussian distribution

2. Sample from the discrete gaussian distribution

Papers:

  1. Deep Learning with Differential Privacy. here
  2. The Distributed Discrete Gaussian Mechanism for Federated Learning with Secure Aggregation. here

Questions:

ooovi commented 1 year ago

this explains why approach 2. (Sample from the discrete gaussian distribution) results in a DP mechanism.

for all c clients:
   1. compute gradient and clip to L2-norm 1
   2. round towards 0 componentwise to get n-bit fixedpoint in (-1, 1)
   3. project to integers in (0, 2^n) using p(x) = 2^(n-1)*x + 2^(n-1) componentwise
4. sum all gradients

5. add discrete gaussian (0, sigma²) noise componentwise

6. modulo to obtain a field element 
7. project to float using pp(y) = 2^(1-n) * y - c

privacy

steps 1.-4. describe a function to the integers. as for each input gradient $\lVert x\rVert_2 \leq 1$ because of the clipping in step 1., exchanging any one clipped input gradient $x$ by any other $x'$ (and hence, replacing any one entry in any client's data set by a different entry) will change the result of the computation up to step 4. by at most

$$\left\lVert \sum{y \in X} \bar p(y) - \sum{y \in X'} \bar p(y)\right\rVert_2 = ||\bar p(x) - \bar p(x')||_2 \leq 2^{n-1} \cdot(||x||_2 + ||x'||_2) \leq 2^n$$

where $\bar p$ is the componentwise application of the projection p to integers described in the code, $X$ and $X'$ are the input gradient sets with $x$ and $x'$ exchanged and the other inputs the same. hence the sensitivity of steps 1.-4. is $2^n$. for any $\epsilon>0$ and $\sigma = \frac{2^n}{\epsilon}$, step 5. makes the whole procedure $\frac{1}{2} \epsilon^2$ zero-concentrated differentially private (https://arxiv.org/pdf/2004.00010.pdf theorem 14 with all $\sigma_j = \sigma$). hence to obtain $\rho$-zCDP, we should choose $\sigma = \frac{2^n}{\sqrt{2\rho}}$.

steps 6.-7. don't affect privacy due to post processing invariance of zCDP (https://arxiv.org/pdf/1605.02065.pdf lemma 1.8).

we need to make sure we're doing the rounding as described in step 2 (#28).

also, this is the privacy guarantee for one training step only. zCDP composes nicely (https://arxiv.org/pdf/1605.02065.pdf lemma 2.3), so performing k iterations of the above algorithm using the same input data sets for the clients will yield a k * rho zCDP algorithm.

federation

steps 1.-3. are done locally with the clients. the resulting integers are secret shared among the aggregators, which verify the clipping was done correctly using our vdaf. they perform steps 4.-7. on the ciphertext in a distributed manner, except for the noising in step 5., which is done by each aggregator using plaintext noise known only to that aggregator. each aggregator has to add the full amount of noise, because they are assumed to be honest but curious and could reconstruct a less-noised version of the result if the other aggregator did not add all the noise necessary for the DP guarantee. the field used for this part of the computation is chosen large enough to avoid modulo wraparound for the sum computation, which allows us to compute the sensitivity as if it were a function on the integers.