WICG / turtledove

TURTLEDOVE
https://wicg.github.io/turtledove/
Other
515 stars 216 forks source link

Fallback timing leak from k-anonymity failure #988

Open martinthomson opened 6 months ago

martinthomson commented 6 months ago

This text here suggests that a fallback from the same interest group could be re-bid if the winning ad doesn't reach a k-anonymity threshold:

Since a single interest group can carry multiple possible ads that it might show, the group will have an opportunity to re-bid another one of its ads to act as a "fallback ad" any time its most-preferred choice is below threshold.

After reading the specification, it appears as though the k-anonymity check is run for every ad in an auction the winning interest group if an original win fails the k-anonymity check. The bidding logic then only receives ads that have passed the threshold. This makes sense, in that it only leaks that the interest group is in an auction where an ad failed to reach a threshold and it only does that toward the k-anonymity service. There are real-time queries to the k-anonymity service, but those are limited.

I had to go poking around in Step Step 27.12.10.1.8 and (under that) Step 27.12.10.1.8.3.2.1 and Step 27.12.10.1.8.5.2.1 of the algorithm to find this out (those numbers: 😱).

This creates a pretty strong timing signal that the winning bid failed k-anonymity checks. I couldn't find information on how many ads are permitted in an interest group, but waiting for that many separate queries to a remote service will take a very different amount of time than just running the auction. It seems likely that sites will be able to learn when a k-anonymity threshold is not reached.

Now, if the theory is that sites could just ask the service themselves, that is perhaps true, but if the goal is to probe for the presence of a particular interest group, this seems like a pretty good attack that can leak one bit of information per auction.

Let's say that I want to check if someone has visited another site. With help from that site, an interest groups with a common name but multiple unique ad URLs that differ for every visitor to that site. This interest group will always bid high (at least on my site). On my site, I put that interest group into an auction with either locally generated bids or a separate interest group that bids low on a single very widely-used ad URL that has already passed k-anonymity checks. If the auction takes a short amount of time, then the person has not visited that other site. If the auction takes a long time then it is likely because the k-anonymity service is getting a workout because the person has visited the other site. Best case, the difference is two round trips to the service, which seems pretty observable.

Now make the provision of additional interest groups on the remote site conditional on other things (like bits in the user ID assigned to each visitor) and you have a pretty effective cross-site recognition widget.

michaelkleber commented 5 months ago

This is a bug in the spec: We don't call the k-anonymity server on the critical path of an auction, and we should update the spec to make clear that the k-anon status of ads are all known locally before the auction starts. Real-time queries would slow the auction down a bunch, as well as giving rise to the timing attack you've pointed out.

martinthomson commented 5 months ago

That implies a requirement that every ad in every interest group that might be part of an auction needs to have this information primed. (I say "might be part of an auction" here because the interest group selection logic means that the seller doesn't necessarily know if an interest group is going to be selected.) An auction might contain tens of DSPs, each of which have hundreds of interest groups, each of which has multiple ads. That's an awful lot of information to keep live.

michaelkleber commented 5 months ago

I don't think it's particularly burdensome. The storage isn't a problem; the Interest Group itself is holding far more data. And the Query operation has a cost because of the need to periodically re-check, but it only takes one query to the k-anon server to look up as many creative-hash statuses as fit in a URL.