PipelineDP is a Python framework for applying differentially private aggregations to large datasets using batch processing systems such as Apache Spark, Apache Beam, and more.
This PR implements computing of histograms for (partition_key, count) (i.e. how many partitions have count=1, how many partitions have count=2, ... ) and (partition_key, privacy_id_count) (i.e. how many partitions have privacy_id_count=1, how many partitions have privacy_id_count=2, ... ).
Those are the same histograms which are used for computing cross and per partition contributions per privacy_id.
These histograms will be used in parameter tuning.
This PR implements computing of histograms for
(partition_key, count)
(i.e. how many partitions have count=1, how many partitions have count=2, ... ) and(partition_key, privacy_id_count)
(i.e. how many partitions have privacy_id_count=1, how many partitions have privacy_id_count=2, ... ).Those are the same histograms which are used for computing cross and per partition contributions per privacy_id.
These histograms will be used in parameter tuning.