OpenMined / PipelineDP

PipelineDP is a Python framework for applying differentially private aggregations to large datasets using batch processing systems such as Apache Spark, Apache Beam, and more.
https://pipelinedp.io/
Apache License 2.0
275 stars 77 forks source link

Absolute budget per aggregation #265

Open dvadym opened 2 years ago

dvadym commented 2 years ago

Context

The workflow for computing DP aggregations with PipelineDP is the following (not important here steps are missing, the full example):

# Define the total budget.
budget_accountant = pipeline_dp.NaiveBudgetAccountant(total_epsilon=1, total_delta=1e-6)
dp_engine = pipepine_dp.DPEngine(budget_accountant, ...)
dp_result1 = dp_engine.aggregate(input_data, params1, ...)
dp_result2 = dp_engine.aggregate(input_data, params2, ...)
...
# Compute budget per each DP operation. 
budget_accountant.compute_budgets()

DPEngine.aggregate is API function that performs DP aggregation. Now the only way to specify how to split budget over multiple aggregation budget_weight field. The idea is that the aggregation gets the budget proportional to the weigh (sum of weight is not necessary 1).

Another downside is that budget_accountant.compute_budgets() splits the whole available budget, so it might be called only once.

Goal

To introduce the way to request an absolute budget (i.e. (epsilon, delta) per aggregation).

thehimalayanleo commented 2 years ago

I'd like to work on this. Just to confirm a few details,

  1. Does aggregation just add a new parameter which is then finalized at the end with the _computebudget call?
  2. So, right now the _budgetweight splits the entire budget right? And we instead want the option to add an absolute budget independent of the total budget? Is my understanding correct?
  3. Should the call to budget_accountant.compute_budgets() then specify the absolute budget for this computation and for the rest split the total budget - absolute budget?

To summarize we either use the _budgetweight or the _absolutebudget param.

dvadym commented 2 years ago

Thanks Ajinkya! Sure, go ahead

My answers: 1.Yes, AggregationParams need to have 2 additional parameters epsilon and delta. 2.Yes, correct, currently budget_weight corresponds to the splitting of the whole budget. Please keep in mind that weights don't need to be sum up to 1, e.g. weights = [1,2] would mean budget is split as 1/3 for the first aggregation, 2/3 for the 2nd. | And we instead want the option to add an absolute budget independent of the total budget? Yes, correct, and compute_budgets() should check that the requested absolute budget doesn't exceed the total budget. 3.Do you mean to have budget_accountant.compute_budgets(epsilon, delta)? Yes, this is a very good idea, if the PipelineDP user doesn't want to spend all the budget immediately. I thought about creating a separate issue for this, but if you like we can do it in this issue as well.

Please let me know if you have more questions. I'm happy to help.

dvadym commented 2 years ago

Hi @thehimalayanleo! Any progress? Is it anything I can help? If needed we can have a VC or pair programming

thehimalayanleo commented 2 years ago

Hi @dvadym, I am having some logistical blocker. Do you think we could have a quick VC? Spoke with Chinmay about this too!

dvadym commented 2 years ago

@thehimalayanleo sure, we can have a VC. Please write me in slack, and we can find a time. In OpenMined workspace my slack is vadym.

thehimalayanleo commented 2 years ago

Slacked you!