Update to SPARROW reporting

BasileLeparmentier commented 4 years ago

Hi,

Thanks to the many constructive feedbacks on the SPARROW proposal, we are happy to propose a new version for the reporting capabilities. We believe that this proposal improves on securing users privacy without much compromises on the advertising use cases that SPARROW aims at preserving. In order to do so, we replaced the log-level reporting by three different levels of reporting, each of them playing on granularity and delay to serve different advertising use cases:

A low latency aggregated report, whose purpose is to be used for campaign spend management and billing
A delayed, personalized, display-level report, defined by the user, with privacy-preserving mechanisms, intended to be used for fraud detection, campaign optimization (machine learning), AB testing, etc.
A delayed report listing all ad units served (without counting them precisely), for publishers to be able to audit publisher brand safety.

With this actionnable proposal, which should be precise enough to be implemented, we believe we address the concerns on privacy attacks on SPARROW, satisfying privacy sandbox requirements, preserving most of the ecosystem current capabilities, and ultimately allowing for a fair, thriving advertising-backed Open Web. Once again, we thank the community in advance for their feedbacks as they'll help bolster the SPARROW proposal.

Detailed document can be found here: https://github.com/BasileLeparmentier/SPARROW/blob/master/Reporting_in_SPARROW.md

michaelkleber commented 4 years ago

That's great, Basile, thank you — I believe those are all feasible, though of course there is work to be done in hammering out the details.

BasileLeparmentier commented 4 years ago

Hi Michael,

not sure I was clear (I see it now that it is not clear this is just an introduction to a document) but there is a full document on detailing this proposal https://github.com/BasileLeparmentier/SPARROW/blob/master/Reporting_in_SPARROW.md

Sorry for the absence of clarity. Best, Basile

michaelkleber commented 4 years ago

Yikes, sorry, I completely missed that! I'll look at the details soon.

BasileLeparmentier commented 4 years ago

Your comment still stands and I think even with it there will be details to be hammered, but less than you thought by thinking this was all the proposal^^.

Best

michaelkleber commented 4 years ago

Sorry for the delay on a more detailed response!

At a high level, my reactions are:

I still think we will want the "privacy-preserving mechanisms" to include something like differential privacy rather than the k-anonymity that you use in your examples. But this is a pretty well-studied problem (the database being queried is all in one place, the number of queries is limited, etc), so we can implement a best-in-class solution here.
I'm very unsure about privacy relying on "we propose that access to the reports be conditioned to a legally binding agreement that those two sources of data are never crossed". I can't see how the the Gatekeeper would be in a position to know if two people that it gave reports to were sharing information, and I don't want to even imagine what this might imply if, say, one company purchased another one. I think we will be on much safer ground if we keep the threat model where the Gatekeeper pessimistically assumes that everyone is sharing the reports it gives out, and make the reports private enough that we don't mind.

I'll open separate issues about more specific questions.

BasileLeparmentier commented 4 years ago

Hi Michael,

Thanks a lot for your feedback.

On your first point, we did propose k-anonymity because we don't think differential privacy is adapted to the use cases of online advertising. This is a complex topic and we intend to publish a blog post to explain our position, but the gist of our position is that differential privacy is ill-suited when:

The label of interest is rare (and click and sales are extremely rare).
The dimensionality of the input space is enormous. In advertising on the open web, because of its structure, we have these two problems together. We thus decided to use differential privacy only for the aggregated reports, where we think these two problems are of smaller magnitude. They should indeed mainly be used mostly for campaign management, displays and cost at the campaign level being frequent events. In this scenario, differential privacy seems to be the best solution. I will come back to you as soon as possible with the detailed post on this matter.

In this new reporting scheme, the privacy leaks are very close to zero so legal agreement are not necessary. We think that solving the multi-faceted problem using solely technical means would lead to an unsatisfying solution. While I, a techie, have the same tendency to try to engineer my way out first, sometimes the last mile is handled significantly better via other means. Therefore, we think that we should not forget that other tools are available and that other industries handle privacy issues with some Chinese walls and compliance processes rather than sheer engineering power (by the way, Google also uses this type of schemes if I am not mistaken: despite belonging to the same company, Google Ads do not have access to the browsing ID and the full browsing history of the user for targeting purposes). We should try to adopt the same methods when the remaining corner cases are small enough and the cost of handling them using engineering outweighs the benefit.

This legal agreement / set of predefined rules would only be there to cover the last bits that were not handled technically.

Asking the DPA's to be in charge of auditing any legal points that might be used for TURTLEDOVE/SPARROW could be an option.

michaelkleber commented 4 years ago

OK, I look forward to your blog post, and further discussion on differential privacy and other approaches to meet the privacy needs.

I am indeed very interested in both technical and policy approaches — and of course trusting a Gatekeeper is itself a policy choice, from the browser's point of view. But as policy-type solutions go, I don't particularly like giving out information with privacy properties that depend on forbidding collusion.

BasileLeparmentier commented 4 years ago

Hi Michael,

Sorry for the delay, I was in vacation, but you can find our blog post on differential privacy and why we think it has strong limitation in the case of online advertising here https://github.com/Pl-Mrcy/privacysandbox-reporting-analyses/blob/master/differential-privacy-for-online-advertising.md .

Best, Basile

WICG / sparrow

Update to SPARROW reporting #9