PipelineDP is a Python framework for applying differentially private aggregations to large datasets using batch processing systems such as Apache Spark, Apache Beam, and more.
This PR contains the following changes:
1.Introducing ExplainComputationReport class which is a container for explain computation report.
2.Adding ExplainComputationReport to DPEngine.aggregate arguments, for returning the report.
3.Adding ExplainComputationReport to Beam and Spark APIs.
4.Updating examples
An example of the Explain Computation report:
DPEngine method: aggregate
AggregateParams:
metrics=['SUM']
noise_kind=laplace
budget_weight=1
Contribution bounding:
max_partitions_contributed=2
max_contributions_per_partition=1
min_value=1
max_value=5
Partition selection: private partitions
Computation graph:
1. Per-partition contribution bounding: for each privacy_id and eachpartition, randomly select max(actual_contributions_per_partition, 1) contributions.
2. Cross-partition contribution bounding: for each privacy_id randomly select max(actual_partition_contributed, 2) partitions
3. Private Partition selection: using Truncated Geometric method with (eps=0.5, delta=1e-06)
4. Computed sum with (eps=0.5 delta=0)
This PR contains the following changes: 1.Introducing
ExplainComputationReport
class which is a container for explain computation report. 2.AddingExplainComputationReport
toDPEngine.aggregate
arguments, for returning the report. 3.AddingExplainComputationReport
to Beam and Spark APIs. 4.Updating examplesAn example of the Explain Computation report: