OpenMined / PipelineDP

PipelineDP is a Python framework for applying differentially private aggregations to large datasets using batch processing systems such as Apache Spark, Apache Beam, and more.
https://pipelinedp.io/
Apache License 2.0
275 stars 77 forks source link

Fix split budget for multi configuration in Utility Analysis #385

Closed dvadym closed 1 year ago

dvadym commented 1 year ago

What was broken?

For simplicity assume that metrics=[count, sum] and there are 2 configurations to compute - max_partition_contributed = [1,2].

For computing each metric for each input configuration UtilityAnalysisCombiner is created, before this PR the code was the following

for metric in metrics:
  for configuration in configurations:
      budget = request_budget()
      // create  combiner

The problem is that it requests budget 2*2 = 4 times. Which is incorrect, since different configurations have different budget. This PR fixes that by ensuring that for each metric independently of the number of configurations request_budget is called once.

dvadym commented 1 year ago

Thanks for review!