WICG / attribution-reporting-api

Attribution Reporting API
https://wicg.github.io/attribution-reporting-api/
Other
360 stars 171 forks source link

aggregate reporting helper service threat model #157

Open erik-anderson opened 3 years ago

erik-anderson commented 3 years ago

The Attribution Reporting API and linked Multi-Browser Aggregation Service Explainer both discuss the use of multi-party computation with some helper services.

While there's still a lot to be determined with regard to how helpers will be approved and what criteria they will need to meet, it would be helpful to understand the general threat model envisioned and in what scenarios we want to depend on technical enforcements and in what scenarios we will leverage policy- and/or auditing-based mechanisms.

A concrete example to discuss: if the labels themselves were readable by each individual helper, but the true values cannot be reconstructed without one helper service colluding with the other helper, have we met a reasonable level of protection for users' privacy and concerns that reporting origins might have about what helpers may learn about their business?

There are going to be different threats different folks will be worried about and likely differing opinions about the severity of them-- it would be great to use this issue to discuss what those concerns are to ensure research into possible protocols can be guided by a reasonable bar w.r.t. the threats.

csharrison commented 3 years ago

For an MPC system the threat model should include a helper acting maliciously. Otherwise, the reasons for doing MPC are less clear in the first place. I think it makes sense to accept reduced privacy in this regime but not zero privacy.

This brings us to the example:

A concrete example to discuss: if the labels themselves were readable by each individual helper, but the true values cannot be reconstructed without one helper service colluding with the other helper, have we met a reasonable level of protection for users' privacy and concerns that reporting origins might have about what helpers may learn about their business?

I think our comfort with this should be based on the domain where the labels are drawn from. In the simple "honest but curious model", you can imagine one of the helpers simply leaking all the labels in the clear. In something like PCM where there are ~4k labels, maybe it is fine to make those labels public (in PCM they are explicitly public to the caller without going through aggregation) since the labels are so coarse grain. However, with large domains this could clearly give up all privacy. In the aggregate explainer we discuss a 32-bit label domain which alone (without values) could be enough to do large scale tracking. Additionally, there have been discussions (e.g. third bullet in https://github.com/WICG/conversion-measurement-api/issues/116) of even larger domains from arbitrary strings.

One possible way of formalizing this is to say that in an MPC system that aims to provide differential privacy, we'd like DP bounds to be some constant factor away from the ideal state where all helpers are acting honestly, but this would require hiding the labels.