OpenMined / PipelineDP

PipelineDP is a Python framework for applying differentially private aggregations to large datasets using batch processing systems such as Apache Spark, Apache Beam, and more.
https://pipelinedp.io/
Apache License 2.0
275 stars 77 forks source link

Expose explain computations in API #387

Closed dvadym closed 1 year ago

dvadym commented 1 year ago

This PR contains the following changes: 1.Introducing ExplainComputationReport class which is a container for explain computation report. 2.Adding ExplainComputationReport to DPEngine.aggregate arguments, for returning the report. 3.Adding ExplainComputationReport to Beam and Spark APIs. 4.Updating examples

An example of the Explain Computation report:

DPEngine method: aggregate
AggregateParams:
 metrics=['SUM']
 noise_kind=laplace
 budget_weight=1
 Contribution bounding:
  max_partitions_contributed=2
  max_contributions_per_partition=1
  min_value=1
  max_value=5
 Partition selection: private partitions
Computation graph:
 1. Per-partition contribution bounding: for each privacy_id and eachpartition, randomly select max(actual_contributions_per_partition, 1) contributions.
 2. Cross-partition contribution bounding: for each privacy_id randomly select max(actual_partition_contributed, 2) partitions
 3. Private Partition selection: using Truncated Geometric method with (eps=0.5, delta=1e-06)
 4. Computed sum with (eps=0.5 delta=0)