buettner / private-prefetch-proxy

Proposal to use a CONNECT proxy to obfuscate the user IP address for privacy-enhanced prefetching.
32 stars 6 forks source link

How does this handle Cache-Control directives? #18

Closed robrwo closed 2 years ago

robrwo commented 3 years ago

I maintain a website that has pages and some content that is customised for a user based on IP/geo-location as well as whether the user is logged in (using session cookies). The site's pages set Cache-Control and Vary headers accordingly.

I am concerned that allowing a prefetch proxy may mean that content intended for anonymous users, or users at a different geographic location, will be used by a user that is logged in but using such a proxy. Because of that, I've disabled it on the site until some of these issues are made clear.

The page at https://github.com/buettner/private-prefetch-proxy/blob/main/README.md suggests we look for the "Purpose: prefetch" headers. How are these requests to be rejected? HTTP 403?

Rejecting all prefetch requests seems rather blunt. How do we distinguish between browser prefetching in the background vs a privacy or prerendering proxy?

buettner commented 3 years ago

The proxy cannot cache resources. The connection between Chrome and the destination webserver is end-to-end encrypted, the CONNECT proxy just passes encrypted bytes between the two endpoints but cannot inspect the content. So with respect to how the proxy handles Cache-Control and Vary directives, it can't see those headers and also can't do any caching. Does that help? I worry I misunderstood your question...

For the logged-in case, assuming logged-in state is tracked using a cookie, the prefetch feature won't provide any benefit for those users. Chrome can't send the cookie on a prefetch request, as that would not be private, but also can't use the response from a request made without the cookie because the page may vary on cookie and the user would see the wrong page. We are exploring mechanisms by which a site can tell Chrome that it supports uncredentialed prefetches even when the user has an existing cookie, but that is not supported yet.

Rejecting all prefetch requests seems rather blunt.

Instead of looking for the Purpose: Prefetch header, private prefetching can be disabled using the traffic-advice file. But maybe the concern is that you don't want to block all prefetches from the proxy, just some requests? Other than looking for the IP's of the proxy, there is no easy way to distinguish between prefetches that go direct to the website vs prefetches that are sent via the proxy.

... between browser prefetching in the background vs a privacy or prerendering proxy?

Note that all prefetches are trigged by a website explicitly asking for the prefetches to happen, e.g., using the Speculation Rules API. That API optionally allows websites to add the "anonymous-client-ip-when-cross-origin" requirement, in which case the prefetches are sent via the proxy.

How are these requests to be rejected? HTTP 403?

Any non-200 response code will work, but we should provide specific guidance -- I will update the proposal text after I think through it a bit.

Please let me know if that information addresses your concerns, or if you have new questions. Sorry if I misunderstood something.