WICG / turtledove

TURTLEDOVE
https://wicg.github.io/turtledove/
Other
515 stars 216 forks source link

k-anonymity analysis does not model browser ID #1001

Open martinthomson opened 5 months ago

martinthomson commented 5 months ago

The results in this short paper contain a false negative probability for the concrete numbers that are being used in practice. This is useful, but not directly applicable because they fail to account for the collision risk associated with duplicate values of the low entropy browser IDs that the API uses.

With the number of bits ($j$) in the browser identity being 11 or less the odds of a collision before hitting a threshold of 50 is quite high due to the birthday paradox ($2\cdot log(50) \approx 11.3$). That means that false negative chance could be quite a bit higher than the analysis suggests.

In #1000 I suggested an alternative design that isn't vulnerable to this particular problem.

aleepasto commented 5 months ago

Hi Martin, you are right that we have not incorporated the collision probability in that analysis.

We are using j=16 in the production setup. That corresponds to about <2% probability of any collision for a group of K=50 users. Moreover, for a group of size K=50 the probability of 2 or more collisions (i.e., less than or equal to 48 distinct values among them) is < 0.02%.

In our analysis we have shown that for a group of 50 + 8 = 58 users the probability of a false negative without any collision is < 1%. By a similar analysis, a pessimistic bound shows that a group of 50+8+2 = 60 users will have a false negative with probability < 1.03% (as such a group will result in less than 58 distinct elements with probability <0.03% and for 58 distinct element, DP noise results in a false negative with probability < 1%).

Thanks for pointing this out, we will update our PDF to include this analysis.