WICG / privacy-preserving-ads

Privacy-Preserving Ads
Other
100 stars 20 forks source link

Publisher–SSP (or Publisher-DSP) collusion risk #11

Open michaelkleber opened 3 years ago

michaelkleber commented 3 years ago

It seems like at the time of the ad request, the publisher knows the true contextual signals C, and the SSP learns the anonymized version C'. But as you point out in the Timing correlation section of your threat analysis, the need for a real-time ad request and response lets both of them join their respective signals with a timestamp whose resolution is on the order of the duration of the ad request (surely a single-digit number of seconds, no matter how willing we are to inject latency).

That makes it seem inevitable that a colluding publisher and SSP would be able to learn the collection of all the user/etc S' signals associated with a single user.

I believe this means that over time, we need to assume the publisher could recover the un-noised (DP-free) user signals.

mehulparsana commented 3 years ago

Thanks for raising the concern and I acknowledge the risk.

Time based linkage leverages <time window, IP, UA, publisher context> to link both requests (publisher webserver and SSP). We have introduced two constructs in the flow to prevent this -

  1. Context anonymization for PARAKEET request - construct to reduce contextual information granularity and pass signals which passes k-anonymity test within a time epoch. Privacy test for contextual signal will be ongoing test to prevent adversarial attacks. System can introduce spurious ad requests with random but consistent fake user feature vector s' to minimize deterministic learning of user features. This means that publisher and SSP trying to determine true user feature vector, will find it hard to achieve so.

  2. Proxying the ad request - construct will rotate IP and UA to reduce fingerprinting. We have explained how to preserve location and device semantics at some granularity.

It may be obvious however I would like to explain one more aspect about user features s'. User features are encrypted using DSP keys. The setup will need collusion between publisher, SSP and multiple DSP (who can decrypt user features) to retrieve and denoise user features.

Please note that these mitigations do not eliminate all risk of adversarial attacks. We propose to these constructs make it non-deterministic and expensive.

KeldaAnders commented 3 years ago

@michaelkleber we wanted to give you a heads up that we've added this issue to the agenda for the Wednesday, April 21st at 5 pm CEST / 11 am EDT / 8 am PDT discussion.

michaelkleber commented 3 years ago

Thanks, I'll be there! I think MaCAW is already a great answer to this question, but I look forward to more discussion.

jdelhommeau commented 2 years ago

Trying to better understand the issue here. Eventually, IG will be encrypted and only readable by the DSP. So how the SSP and pub would be able to collude to get user's IG?

michaelkleber commented 2 years ago

The problem here is the risk of de-anonymization for anyone who receives anonymized ad targeting information. The noised version of the user signals (called S' here) is inevitably made visible to someone in real time, whether it's the SSP or just the DSPs due to encryption. If that party works with the publisher, then I don't believe there is any way to prevent them from figuring out which contextual-vs-personalized pair of records belong to the same person.

The key problem, as discussed above, is the timestamp associated with the ad request. But it's actually worse than that: suppose that somehow we could blur the timestamp to only 1-minute accuracy. Even so, if a particular user loads n different pages on a particular website, then both the publisher and the DSP can associate a list of n timestamps with that person's first-party identity (in the publisher's case) or that person's noised targeting signals (in the DSP's case). To avoid that you would need the DSP-visible signals S' to be quite different on each subsequent ad request — so different that a DSP wouldn't be able to cluster the various S' from a particular website into users! This again seems infeasible.

jdelhommeau commented 2 years ago

thank you for clarifying @michaelkleber , I got confused because you were mentioning SSP and Context signals (C and C') but DSP colluding with user based on timing attack is a different issue I understand.

michaelkleber commented 2 years ago

Yes, good point, apologies for the name of the issue being perhaps too narrow! I'll go update it for clarity.