OpenMined / PipelineDP

PipelineDP is a Python framework for applying differentially private aggregations to large datasets using batch processing systems such as Apache Spark, Apache Beam, and more.
https://pipelinedp.io/
Apache License 2.0
275 stars 77 forks source link

Introduce AggregateMetrics dataclass #357

Closed dvadym closed 2 years ago

dvadym commented 2 years ago

Before this CL, the output of UtilityAnalysis is list of metrics of the format:

Private partitions: [PartitionSelectionMetrics, AggregateErrorMetrics, PartitionSelectionMetrics ...] Where each consecutive pair PartitionSelectionMetrics and AggregateErrorMetrics correspond to one Utility Analysis configuration (UtilityAnalysis can run for multiple parameters simultaneously, e.g. for differemt max_partition_contributed).

Public partitions: [AggregateErrorMetrics, AggregateErrorMetrics ...]

Notes:

  1. PartitionSelectionMetrics contains partition selection metrics (e.g. the expected number of partitions)
  2. AggregateErrorMetrics contains error per partitions (e.g. average error per partition)

Having the output in different format for private and public partitions is very inconvenient and requires calling code to do this processing. This PR introduced class AggregateMetrics which contains those both PartitionSelectionMetrics and AggregateErrorMetrics.