PipelineDP is a Python framework for applying differentially private aggregations to large datasets using batch processing systems such as Apache Spark, Apache Beam, and more.
DataFrame API allows to perform DP on DataFrames (now only Spark DataFrames are supported). In DataFrame API privacy_key, partition_key and values are specified by column names. Currently partition_key (aka group by key) can be specified only by 1 column. This PR implements possibility to have multiple columns as partition_key .
DataFrame API allows to perform DP on DataFrames (now only Spark DataFrames are supported). In DataFrame API privacy_key, partition_key and values are specified by column names. Currently
partition_key
(aka group by key) can be specified only by 1 column. This PR implements possibility to have multiple columns aspartition_key
.