csharrison / aggregate-reporting-api

Aggregate Reporting API
41 stars 10 forks source link

Guidance on possible range of values for T #3

Open benjaminsavage opened 4 years ago

benjaminsavage commented 4 years ago

In evaluating this proposal, and providing feedback about how well it supports our various use-cases, it would be very helpful to have at least some guidance on the value of “T”.

I assume that Google is still doing research on this topic, and the value isn’t settled yet. As such, I don’t expect a precise answer. But a range would be helpful. Perhaps just a “90% confidence interval”, to be interpreted as “I, Charlie, estimate that the ultimate value of T will probably wind up lying between A and B with 90% confidence.”

I’m assuming there is a 100% probability that T will lie between 1 and 1,000,000

I’m trying to understand if we should imagine T in the 100 to 1000 range, or if it’s more like the 10,000 to 100,000 range.

csharrison commented 4 years ago

Hey Ben sorry this repo has gone a little unloved. We are exploring alternatives to simple thresholds like differential privacy which can hopefully be used here. If we can achieve differential privacy in the central model (i.e. using some centralized aggregation service) then most of the privacy comes from adding small amounts of noise to the output. Thresholds are actually only necessary if the domain of the output is unknown beforehand (e.g. label in the histogram for example). Otherwise, thresholds will be proportional to the noise we're adding to each bucket.

Of course, noise / thresholds all depend on differential privacy parameters (e.g. epsilon) that we'd need to discuss. We'll try to publish more information on these ideas soon.