OpenMined / PipelineDP

PipelineDP is a Python framework for applying differentially private aggregations to large datasets using batch processing systems such as Apache Spark, Apache Beam, and more.
https://pipelinedp.io/
Apache License 2.0
275 stars 77 forks source link

DP percentiles #351

Closed dvadym closed 1 year ago

dvadym commented 2 years ago

This PR implements DP percentile computations.

API: It adds metric metrics.Percentile, so now it's possible to compute percentiles with DPEngine.aggregate

dp_engine.aggregate(... metrics=[metrics.Count, metrics.Percentile(50), metrics.Percentile(90)] ...)

The result looks like: (8, MetricsTuple(count=14532.95396910912, percentile_50=2.9999810443476416, percentile_90=4.99996013795033))

Implementation:

  1. Metrics: previously it was enum, but metrics.Percentile(50) can't be implemented with enum, so Metric class was implemented.
  2. QuantileCombiner is implemented. It uses QuantileTree object which is wrapper from PyDP for the object from Google C++ building block library