Closed dvadym closed 3 years ago
Hey, I'd like to take over this issue.
Sure, thanks!
Hey, the PR on PyDP#374 was merged. The newly added API and pseudo-code of usage:
from pydp.algorithms.partition_selection import create_paritition_strategy
k_tsgd_selector = create_partition_strategy("truncated_geometric",
epsilon, delta, max_partitions) # type: PartitionSelectionStrategy
...
def get_dp_partitions(database, partition_selector: PartitionSelectionStrategy):
database_partitions = database.partition_by(KEY)
for partition in database_partitions:
if partition_selector.should_keep(paritition.num_users):
yield partition
dp_partitions = get_dp_partitions(data, k_tsgd_selector)
Thanks!
Context
Definition: The partition keys are called private if they are not known in advance but are determined based on the data contributed by the individuals in the datasets. More details.
The private partition selection is a procedure that ensures that the output partitions keys are selected in DP fashion. There are at least 2 methods for private partition selection. More details:
PyDP project provides wrappers for Google C++ DP library. But wrappers for private partition selection are missing.
Goals
To implement wrappers for Truncated geometric thresholding and Laplace/Gaussian thresholding in PyDP.
C++ library API:
Python API