Always fetch the origin policy on every request

annevk commented 5 years ago

Thinking more about some of the privacy issues I'm wondering if we should require HTTP/2 or later and have a fixed URL for the policy. That way we might be able to address some of the performance issues by fetching the policy in parallel with whatever is requested from that origin.

(If we assume that everyone eventually needs a policy we could even do away with the response header and use 4xx / 200 + application/json as signal, plus HTTP cache semantics for updates?)

michael-oneill commented 5 years ago

I like it. Legacy HTTP would still work but with added latency first time (presumably policy would be cached with the target). Encourages people to move to HTTP/2.

domenic commented 5 years ago

Is the implication of this that every HTTP/2 request to an origin for which we haven't yet cached an origin policy is now accompanied by a second request (on the same connection) to /.well-known/origin-policy? (Or maybe just navigation requests?)

That feels expensive, but perhaps that is my HTTP/1 brain thinking...

domenic commented 5 years ago

Nevermind. I wrote this out in more detail in #47 and I can see how to avoid the extra request in many cases.

However in my writeup I didn't see any reason to restrict this to HTTP/2. It'll just be slower on HTTP/1 since round-trips are more costly and push doesn't exist. That seems fine.

domenic commented 5 years ago

In https://github.com/WICG/origin-policy/pull/47 @annevk said:

FWIW, my idea behind requiring H/2 or higher was that we would immediately fetch the policy in parallel with fetching url as sending out an additional request over H/2 that results in a 404 is not that expensive (I think).

Would you do this on every request? (Maybe every navigation request?) Would you do it even if we have cached an origin policy, in order to get potential updates (on the theory that 304s are also cheap)?

annevk commented 5 years ago

Ideally, I think it would be for each new origin that the session encounters, starting with the top-level origin. And ideally it's also up-to-date, but perhaps there needs to be room for configuration there down the line. I'm not sure how realistic this is, but I wanted to throw the idea out there as I rather like the simplicity of it.

domenic commented 5 years ago

And ideally it's also up-to-date

How would you accomplish this part?

annevk commented 5 years ago

@domenic oh sorry, that was meant as a yes to your suggestion. And we could use normal HTTP cache semantics + scope for which the policy won't be updated anyway (if a document stays open for hours and we decide policies are immutable as they well should be pretty please) as a signal when to refetch for a particular origin.

domenic commented 5 years ago

Got it. Then yeah, this feels expensive, but I'd like someone more familiar with actual implementation costs to weigh in... I'll try to rustle Chrome networking folks; could you ask some Mozilla ones?

domenic commented 5 years ago

So in talking with the Chrome networking folks, the general feeling was that this was expensive, especially potentially for server operators. It still might be worth experimenting with, and there are discussions around potential alternatives (e.g. an "extension frame" is apparently a thing we could use?). But the general sentiment is that it'd be better not to push for this immediately, instead waiting to see how important the sync-update case ends up being.

I tried to capture this all in https://github.com/WICG/origin-policy/blob/master/version-negotiation.md#potential-extension . There I note there that this proposal is a compatible extension of the design in https://github.com/WICG/origin-policy/blob/master/version-negotiation.md. (In particular, this proposal doesn't really make the Origin-Policy response header redundant.)

I think as we go to write the spec, we might want to explicitly allow user agents to request the origin policy out of band, or concurrently with the main request, or similar, so they can experiment with strategies like this, or strategies like updating the user's often-visited sites' origin policies.

annevk commented 5 years ago

The drawbacks there mention that it doubles server load, but that assumes these policies basically have no lifespan whatsoever. I would expect policies to last quite a bit longer than not at all.

Fair point on Origin-Policy still having value though.

domenic commented 5 years ago

Well, most importantly it doubles server load for any server that hasn't been updated to deploy a long-lasting origin policy, i.e. every server in existence today. Over time servers could update themselves, but it might be a rough transition.

annevk commented 5 years ago

Well, if there's a 404 we could just wait a day before trying again (unless there's an Origin-Policy header in between).

domenic commented 5 years ago

That gets a bit far away from the "just use HTTP semantics" strategy though, and more into the "browser heuristics" territory.

(I did a quick spec check: HTTP leaves it up to the client whether it considers 404s cacheable or not---Chrome currently considers them uncachable---but if the client does, it needs to follow the usual caching rules with regard to respecting headers etc. Note that e.g. https://facebook.com/.well-known/origin-policy has headers cache-control: private, no-cache, no-store, must-revalidate, pragma: no-cache, and expires: Sat, 01 Jan 2000 00:00:00 GMT. I'm not sure how typical that is, but it's at least one data point.)

annevk commented 5 years ago

I'd be curious to hear @mnot's take on that, but I suspect we'll need some additional logic either way.

domenic commented 5 years ago

I've renamed this issue to "Always fetch the origin policy on every request", to reflect the part of the discussion that isn't yet in the explainer. The idea of using a single location was incorporated into the version negotiation doc.

mnot commented 4 years ago

404 allows clients to heuristically cache responses, meaning that if they don't have an explicit freshness lifetime, the client can synthesise one.

So, OP could specify a heuristic for this resource -- e.g., if there isn't an explicit freshness lifetime, consider it to be one hour.

Also, it's possible to specify a caching layer "above" HTTP -- such as has been done with the image cache. So even if there is an explicit lifetime, you might specify that it has a minimum freshness lifetime of something like ten seconds, or allow it to be used by multiple responses on a page, etc.

WRT Facebook - it looks like they've chosen to make its 404s explicitly uncacheable. shrug. I'm sure there are other examples of sites like this out there, but I suspect they'll adjust (rather quickly).

WICG / origin-policy

Always fetch the origin policy on every request #44