Closed ganevgv closed 3 years ago
Thanks for your feedback and for reporting this error in the Laplace noise parameter. This bug has been fixed in 540a442867dd278a74af4c8f488fc6fadce7088a where the parameter is (num_attributes - k) / epsilon
. In this setting, 1
is chosen as the sensitivity of count values.
I think that the new version should retain a sensitivity of 2
and only remove the factor of 1/num_tuples
See also this blog post on how to reason about the correct sensitivity of count queries: https://github.com/frankmcsherry/blog/blob/master/posts/2018-02-25.md
The sensitivity of count queries can be either 1
or 2
depending on how the privacy of an individual is interpreted. The sensitivity of 1
corresponds to the "standard" definition of Differential Privacy, i.e., whether an individual is present or absent from the dataset. Please correct me if I'm wrong. The blog https://github.com/frankmcsherry/blog/blob/master/posts/2018-02-25.md uses the sensitivity of 2
assuming that the values of an individual are altered in neighboring datasets.
Thanks for your quick reply and fix @haoyueping! I'm closing the issue.
Re: sensitivity 1
or 2
- you're right, it could be either depending on the interpretation. Technically, I think in PrivBayes they use 2
, but personally I'm okay with either.
Description
I believe the sensitivity of the conditional distributions is wrong and as a result you're injecting too little noise to the conditional distributions in the correlated attribute mode with enabled epsilon.
In Algorithm 1 in PrivBayes, the stated sensitivity is
2 / n_tuples
because the Laplace noise is added to the joint probability (as the probability is already materialised). However, you apply the Laplace noise to the counts (as opposed to a probability) which have sensitivity of2
(and not2 / n_tuples
). Only later you normalise the counts to materialise the conditional probability. As a result, you're injecting too little noise to the conditional probabilities.Code excerpts from
DataSynthesizer/lib/PrivBayes.py
:In order to inject the correct amount of noise
2 * (num_attributes - k) / (num_tuples * epsilon)
should be changed to2 * (num_attributes - k) / epsilon
.PS: thanks to @sofiane87 and @bristena-op for the helpful discussions and clarifications.