PipelineDP is a Python framework for applying differentially private aggregations to large datasets using batch processing systems such as Apache Spark, Apache Beam, and more.
Right now we just combine all partitions equally during utility analysis. We want to allow clients do it in a more elaborate way, giving them some set of predefined good weighing functions and allowing them to specify their own. For example, we can provide them with a weighing function that weighs based on the contribution of the partition (e.g. in case of COUNT metric, it is partitions size).
Right now we just combine all partitions equally during utility analysis. We want to allow clients do it in a more elaborate way, giving them some set of predefined good weighing functions and allowing them to specify their own. For example, we can provide them with a weighing function that weighs based on the contribution of the partition (e.g. in case of COUNT metric, it is partitions size).