csharrison / aggregate-reporting-api

Aggregate Reporting API
41 stars 10 forks source link

Clear privacy sandbox design rules #12

Open Pl-Mrcy opened 4 years ago

Pl-Mrcy commented 4 years ago

This question doesn't only relate to this API but also the other exposed APIs (such as the Conversion Measurement API), other reporting schemes (such as another API mentioned in #6 but never detailed further) and the entire TURTLEDOVE framework.

In this repository and others (such as in here for example), you layout several reporting frameworks, even considering different gradations of the desired user-privacy target we are aiming at protecting (Differential, Local Differential Privacy, K-anonymity, etc.)

Could it be possible to precise the exact requirements you expect from the privacy sandbox? Would it also be possible to have at least an order of magnitude for each of the different related variables (minimal size of the cohorts, the minimum number of identical report, epsilon if we are to consider differential privacy, etc.)

I understand that protecting user privacy requires the entire system to be attack proof and that you can't fix one variable without fixing the others. I also understand that these numbers are not set in stone and are up to discussion. However, having a rough idea is necessary for us to get a clear picture and propose relevant amendments. Taking an extreme example, a differential private world would appear radically different to advertisers depending on whether the value of epsilon is 2 or 200.

Do you also intend to provide a POC we could play around with at some point? Do you have any ETA in mind?

csharrison commented 4 years ago

Hi, I think most of our thoughts for aggregate measurement are summarized in https://github.com/WICG/conversion-measurement-api/blob/master/SERVICE.md which makes some of this explainer out of date. We should update the explainer to reference the service document as a potential mechanism for ensuring private reports.

In terms of a POC, we're still iterating on designs so we aren't quite at the implementation phase. We announce the beginning of the prototype phase in blink-dev Intent To Prototype threads: https://groups.google.com/a/chromium.org/forum/#!forum/blink-dev

That's where we would send an email describing our intent to prototype one of these measurement features.

ablanchard1138 commented 4 years ago

Hi,

Thank you for sharing these insightful links, but there are no clear requirements or precisions on the degree of privacy in these.

I think such details are needed if we want to make progress collectively on the privacy sandbox. They are necessary if we want to compare each proposal not only on their theoretical merits, but also on the practical level of privacy they procure.

No system is 100% bullet proof, as you said on DP:

You still leak data over time, but it becomes quantifiable and bounded.

Knowledge of these parameters would help quantify this leakage. It would also help publishers, marketers, brands in understanding how their actual reporting, and actual day-to-day operations would look like in less than 2 years.

We did found a few numerals in this paper attached in one of the page you shared: https://privacytools.seas.harvard.edu/files/privacytools/files/pedagogical-document-dp_new.pdf

Although guidelines for choosing have not yet been developed, they are expected to emerge and evolve over time, as the expanded use of differential privacy in real-life applications will likely shed light on how to reach a reasonable compromise between privacy and accuracy. As a rule of thumb, however, should be thought of as a small number, between approximately 1/1000 and 1.

Albeit not very precise, could you at least confirm if this is the ballpark you have in mind?

csharrison commented 4 years ago

I'll reiterate what I said on the web-adv call here (sorry for the delay). Precise ranges for DP parameters will likely depend on utility (we don't want to launch something that is not useful). The numbers that are referenced in that paper could be seen as a reasonable starting point for experimentation, though we are also exploring larger values too.

Ideally we would be able to measure or estimate a "privacy frontier" by graphing privacy vs. utility and picking a spot that we think is a reasonable compromise.

bluetonnz337 commented 3 years ago

No one existed hello world