Referrer can learn the websites user has visited in the past

ghost commented 3 years ago

Similar to https://github.com/buettner/private-prefetch-proxy/issues/7, with Private Prefetch Proxy and Speculation Rules, any website on the Internet can determine the set of websites a specific user has visited in the past. These attacks do not require any collusion between the proxy and the referrer.

There are 2 possible attacks, and I had alluded to them in https://github.com/buettner/private-prefetch-proxy/issues/7#issuecomment-812184839, but I realized later that I commented on the wrong issue. That issue is about the proxy learning user history and not the referrer. Attacks mentioned in that issue also require the referrer to control the network unlike the 2 attacks below:

Lets say that the referrer knows that the browser prefetches first N hints for sites that are unvisited, and it wants to determine if a specific user has visited foo.com in the past. To do so, the referrer inserts N-1 prefetches for some randomly generated origins that the user is guaranteed to have not visited, followed by a prefetch for foo.com and followed by a prefetch for referrer.com/unique_decorated_url.

Now the referrer would see a fetch for referrer.com/unique_decorated_url in its logs iff only if the user has visited foo.com in the past. If the user has visited foo.com in the past, the browser would skip that hint and only then prefetch referrer.com/unique_decorated_url.

Attack described above can be slightly varied to use fetch timing values available on the server instead of relying on the knowledge of N. Again, lets say that the referrer wants to determine if a specific user has visited foo.com in the past. To do so, the referrer inserts prefetches for referrer2.com/unique_decorated_url_1, followed by foo.com, followed by referrer2.com/unique_decorated_url_2. Here, referrer2.com is an origin controlled by referrer.com.

Now the referrer matches the timestamp for when fetch of unique_decorated_url_1 completed to when the fetch for unique_decorated_url_2 started. This time gap can be used to determine if the browser spent any time in prefetching foo.com, or if the browser skipped fetching of foo.com and directly went from prefetching first hint to the third.

The timestamps would be close iff the user has not visited foo.com in the past. Determining closeness is not that hard since the referrer would know the current RTT from the user to the proxy based on the prefetches made by user to the referrer2.com links.

Note that unlike the attacks mentioned in https://github.com/buettner/private-prefetch-proxy/issues/7#issue-810339403, these two attacks can be carried out by Google websites solely with no way for the browser or the proxy to detect them. So, these attacks are not subject to the Chrome's whitepaper and privacy policy mentioned in https://github.com/buettner/private-prefetch-proxy/issues/7#issuecomment-854843083.

buettner commented 3 years ago

Thanks for the feedback! We'll definitely keep these in mind.

Once we have experimental results, we will be in a better place to come up with a design that mitigates these concerns. For example, N could include visited sites, and prefetches could happen in parallel.

ghost commented 3 years ago

Thanks, that makes a lot of sense. As I mentioned in my comment above, knowledge of N is not critical in this attack, but the two of your suggestions combined together should help.

ghost commented 2 years ago

Are there any updates to share here? Wondering if it's now safe to turn on the Preload setting in Chrome or not.

buettner commented 2 years ago

See https://github.com/buettner/private-prefetch-proxy/issues/7 for more details on how we've addressed the issue.

Note that this feature is still in the experiment phase. Currently, it is still only enabled for a small fraction of user sessions.

(Closing this bug as a dupe of the other.)

RobertMiller23 commented 2 years ago

Hi, can you explain how the attacks described above are handled by mitigations listed in #7? My understanding is that #7 mentions mitigations for threat models where the referrer is either colluding with the network operator OR the referrer is colluding with the proxy.

In the threat model described above, the referrer does not need either of them. Instead the referrer introduces speculation rule to prefetch a URL that it has control over. Does the difference make sense?

buettner commented 2 years ago

Can you explain your threat model again, and highlight how the mitigation in #7 does not mitigate them?

The key mitigation is that Chrome now makes the prefetch request even if the user has visited the site before. Note that that is a behavior change from when this comment was initially posted.

The initial post on this thread mentions, "would see a fetch .. iff only if the user has visited" and "or if the browser skipped fetching of foo.com and directly went from prefetching first hint to the third." -- neither of those happen now.

buettner / private-prefetch-proxy

Referrer can learn the websites user has visited in the past #20