PipelineDP is a Python framework for applying differentially private aggregations to large datasets using batch processing systems such as Apache Spark, Apache Beam, and more.
This PR introduces combiner.expects_per_partition_sampling() method.
If at least one combiner returns true from expects_per_partition_sampling(), sampling per partition is performed. Sampling in partition is required for Mean/Variance/Quantiles. On other hand, when SUM is computed with min/max_sum_per_partition contribution bounding, there should be no sampling, since SumCombiner at first sums per partition contributions and clips to min/max_sum_per_partition.
This PR introduces
combiner.expects_per_partition_sampling()
method.If at least one combiner returns true from
expects_per_partition_sampling()
, sampling per partition is performed. Sampling in partition is required forMean/Variance/Quantiles
. On other hand, when SUM is computed withmin/max_sum_per_partition
contribution bounding, there should be no sampling, since SumCombiner at first sums per partition contributions and clips tomin/max_sum_per_partition
.