OpenMined / PipelineDP

PipelineDP is a Python framework for applying differentially private aggregations to large datasets using batch processing systems such as Apache Spark, Apache Beam, and more.
https://pipelinedp.io/
Apache License 2.0
270 stars 75 forks source link

Computing privacy_id_count and count in PerPartitionUtilityResults #474

Closed dvadym closed 11 months ago

dvadym commented 11 months ago

This PR contains:

  1. Introducing StatisticsCombiner per-partition utility analysis combiner, for computing count, privacy_id_count (in future it can be extended to more statistics)
  2. Adding metrics.Statistics(privacy_id_count, count) to PerPartitionMetrics.
  3. Plumbing work to fillPerPartitionMetrics.statistincs
dvadym commented 11 months ago

Thanks for review!