OpenMined / PipelineDP

PipelineDP is a Python framework for applying differentially private aggregations to large datasets using batch processing systems such as Apache Spark, Apache Beam, and more.
https://pipelinedp.io/
Apache License 2.0
275 stars 77 forks source link

PLD based Budget Accountant #35

Closed dvadym closed 2 years ago

dvadym commented 3 years ago

Context

The privacy budget accounting is an important feature of the DP aggregation system. On the issue BudgetAccountant was implemented. The BudgetAccountant uses the naive composition, namely the total budget is the sum of budgets:

(๐œบ_1, ๐›ฟ_1) + (๐œบ_2, ๐›ฟ_2) = (๐œบ_1+๐œบ_2,๐›ฟ_1+๐›ฟ_2).

The downside of using the naive composition is that the total budget is growing linearly with the number of DP aggregations. There are others, more advanced ways of composing budgets, such that the total budget is growing slower than linearly in the number of DP aggregations.

One of those ways is PLD (privacy loss distributions, math details), which is implemented in dp_accounting library.

Goals

Implementation class PLDBudgetAccountant, which uses dp_accounting library.

class MechanismType(Enum):
   LAPLACE = 1
   GAUSSIAN = 2
   # more mechanisms will be added later

class PLDBudgetAccountant:
   def __init__(self, total_eps : double, total_delta : double):
   def request_budget(self, weight : double, mechanism_type : MechanismType, sensitivity:float) -> Budget
   def compute_budgets(self):

class PLDBudget:
   @property def noise_parameter()

The idea is that PLDBudgetAccountant collects all requests on budget usage from different DP operations (the same way as BudgetAccountant). request_budget() returns a PLDBudget object which does not have yet noise_parameter. And then when the pipeline is constructed, (i.e. all DP operations are known) compute_budget is called and it computes the minimum noise for each DP aggregation, such that the total budget is not more than (total_eps, total_delta)

Note: for some DP operations it might be more important to get more accurate result, that is why weight parameter is used.

Notes on implementation

Note: In this task we can consider PLD as a blackbox, with the following properties

Using PLD for composing of DP mechanisms

Let input is 0<= delta < 1 and DP mechanisms (M_1, ..., M_n)

Example: M_1 is a gaussian mechanism with std = 2, M_2 is laplace mechanism with std = 1 etc

Goal: to compute minimal eps, such that composition of (M_1, ..., M_n) is (eps, delta)-DP.

With PLD it works the following way:

  1. Convert M_i -> PLD_i (using methods from_laplace_mechanism and from_gaussian_mechanism ).
  2. Use method compose to get pld which correspond to the composition
  3. Compute eps = pld.get_epsilon_for_delta(delta)

Note: For PLD the language of (eps, delta) is not natural. There is not much sense to talk about (eps, delta) of a specific mechanism, it's better to think about (eps, delta) of the whole pipeline.

Numerically calibrating noise level

In contrast to the previous subsection, we don't have noise parameters. We would like to find them. Assume that in the setup of the previuos subsection:

we have mechanism (M_1, ..., M_n), with known types (Laplace or Gaussian for now) and a standard deviation of M_i is x/weight_i , where x > 0 is unknown.

The goal is to find minimum x, such that the composition of (M_1, ..., M_n) is (eps, delta)-DP.

Using approach of the previous subsection, for each x > 0 we can compute eps(x). Note, that *eps(x) is decreasing function. So we can do a binary search by x.

Links

  1. dp_accounting library
  2. (optional, more details on PLD) David M Sommer, Sebastian Meiser, and Esfandiar Mohammadi. Privacy loss classes: The central limit theorem in differential privacy. Proceedings on Privacy Enhancing Technologies, 2019.
jspacek commented 3 years ago

๐Ÿ‘‹ @dvadym I'm interesting in working on this issue, and paired programming would be helpful ๐Ÿ !

dvadym commented 3 years ago

Thanks!

dvadym commented 3 years ago

Note: This is a description of an extension of PLD-based accountant.

Support of generic (eps, delta)-dp mechanisms

Context

Now we have support of Laplace and Gaussian mechanism, but there are more DP mechanisms. This feature is about allowing to support, any mechanism, for which only (eps, delta) parameters are known.

The dp_accounting library supports creating PLD for (eps, delta) mechanisms (link).

Notes:

  1. Usually PLD provides a better utility when DP mechanism type is known and supported by PLD (currently Laplace, Gaussian and Randomized response are supported by dp_accounting library). But it's not always the case.
  2. Private partition selection is a good use case where support of (eps, delta)-dp mechanisms are needed.

Changes

  1. Add a new element in MechanismType.
  2. Add provide a way to return eps, delta in MechanismSpec (or maybe a new class?).
  3. Incorporate generic (eps, delta) mechanisms into _find_minimum_noise_std - there should be some conversion between (eps, delta) and std of the noise (happy to chat about that).

Calibrating std of the noise vs (eps, delta) (new section)

The current calibration works that way:

  1. The function compose_distributions maps std -> PLD, and
  2. Then we get eps, from the obtained PLD.
  3. That's we have the function std -> eps, which is decreasing and so we can do binary search

But now we'd like to introduce a generic (eps, delta)-dp mechanism, which doesn't have natural definition of std. There are multiple ways to do that. In principle any way, which maps "std" -> (eps, delta) in some decreasing way would be enough to build 1d optimization problem with a decreasing goal function. One of the possible ways is the following:

  1. Pretend that (eps0, delta0) correspond to Laplace mechanism. We need to find (eps0, delta0) in compose_distribtuions from std. The parameter of this Laplace mechanism is

    b = 1/eps0 => eps0 = 1/b
    std = sqrt(2)b => b = std/sqrt(2), 

    Hence eps0 = 1/b = sqrt(2)/std

  2. And let's take delta0 is to be proportional to eps0, i.e. eps0/eps_total = delta0/delta_total => delta0 = eps0/eps_total*delta_total

  3. In compose_distribtuions function let's use from_privacy_parameters to create PLD for (eps0, delta0)-dp mechanism.

  4. The rest of the calibration is the same.

Note: As mentioned there are different ways how to map std -> (eps, delta), which provides different utility tradeoffs between Laplace/Gaussian vs (eps, delta)-dp generic mechanism.

dvadym commented 3 years ago

@jspacek FYI I've updated a link how to create PLD from (eps, delta) guarantees (step 3 in description)