Enabling fast Origin Policy application

yoavweiss commented 4 years ago

Following discussions about making Feature Policy and Accept-CH application easier (e.g. this one), I think we could make Origin Policy faster than the current scheme.

Talking to @domenic, I understood that the current scheme is as follows:

Browser sends top-level navigation request, and gets response.
If the response contains an Origin-Policy header, it downloads the policy from its well-known location.
The document's commit is blocked on the retrieval of that response.

What would be y'all's feedback on an alternative scheme:

Browser sends request to Origin Policy's well-known location and the top-level navigation request.
- We can limit this to H2+ connections, where this shouldn't have a negative performance impact [1]
The Origin-Policy would also include an indication if the applied should be commit-blocking (e.g. CSP, Origin Isolation) or non-blocking (e.g. Accept-CH, Feature-Policy)
The Origin-Policy resource will only block the commit if the policy is marked as blocking

The above may make the application of the non-blocking features somewhat racy, but there's reason to believe that the typical case would win that race [2].

That seems like something that would be Fast Enough™ in the blocking case (because we're saving an RTT, and HTML is typically harder to generate than a static policy resource), and would solve the deployment issues folks have been struggling, and which currently "require" http-equiv.

Thoughts?

/cc @mikewest @eeeps @colinbendell

[1] 404s can have a negative performance impact as they can be huge or take a slow path on the server side. So the "no negative impact" claim requires proof. [2] Again, proof required

domenic commented 4 years ago

the top-level navigation request

If we restrict ourselves to top-level navigation requests, then origin policy is no longer useful for CORS preflight reduction, which is IMO (and Mozilla's, if I understand @annevk correctly) the most important use case. So we'd need to send it on all requests.

We can limit this to H2+ connections, where this shouldn't have a negative performance impact [1]

This was discussed in #44. I asked the Chrome networking team about this and their main concern was that doubling the number of streams/packets Chrome makes to every server on the internet, with no opt-in, would cause problems for server operators, since most server software has stream limits. Quoting from an internal thread:

If we were clever enough we could get both requests into the same packets. But I think it might be hard to do that and so we might double the number of packets chrome sends which might not be ideal. But an experiment could tell us more.

With HTTP/2 and HTTP/3 the server imposes a limit on the number of concurrent streams that the client can open. Historically, this limits the number of concurrent "actual" requests. With this new behavior, these "extra" requests would count against those limits as well.

Also, getting back non-cachable 404s from every non-origin-policy-using server on the internet is not super-cheap, especially if those 404s are large-ish HTML pages (e.g. the one you can find at https://www.facebook.com/asdf).

Note that this would require turning on credentials for the origin policy request if you want to reuse the HTTP/2 connection, but things were already trending in that direction; see #21 and #89. I know @sleevi had some concerns about that direction, though.

The other issue with this approach is that it does not allow any negotiation between individual pages and the origin policy version. That might be OK, but it's hard to say. I'd encourage you to step through each of the examples in https://wicg.github.io/origin-policy/#examples and make sure that this scheme could work with them.

domenic commented 4 years ago

I have another meta-concern, which is that if you believe it's easier for web applications to configure files at /.well-known/origin-policy than it is for them to configure per-page headers, then the security model at https://wicg.github.io/origin-policy/#well-known-security is flawed. We definitely want origin policy to be at least as hard to deploy as headers, and preferably harder, since it has such wide-ranging impacts. So maybe we'd want to keep the header opt-in just from a security perspective.

eeeps commented 4 years ago

@domenic Hard for who? And for what reasons?

It's difficult for @Cloudinary to explain how to configure HTTP headers in our documentation, because the ecosystem of tooling that's sending headers is so diverse.

I don't think that Origin Policy is necessarily "easier" than HTTP headers, as a means to set an origin-wide Feature Policy. I'd argue that for people who don't own and control the origin in question, it is actually much harder than both headers and http-equiv. I do think that Origin Policy is a more universal way to achieve this end, which can go a long way towards education and (educated) adoption.

annevk commented 4 years ago

Aren't CDNs ideally positioned to make configuring headers for responses easier?

yoavweiss commented 4 years ago

the top-level navigation request

If we restrict ourselves to top-level navigation requests, then origin policy is no longer useful for CORS preflight reduction, which is IMO (and Mozilla's, if I understand @annevk correctly) the most important use case. So we'd need to send it on all requests.

Surely, we could cache the response, right? That would still mean we would need to fetch the origin policy for each cross-origin request on the page on its first visit. But if we make it non-blocking unless the resource indicates it requires a policy, that may not be too taxing.

This was discussed in #44.

Thanks for the pointer. That indeed sounds very much like what I was suggesting.

I asked the Chrome networking team about this and their main concern was that doubling the number of streams/packets Chrome makes to every server on the internet, with no opt-in, would cause problems for server operators, since most server software has stream limits. Quoting from an internal thread:

If we were clever enough we could get both requests into the same packets. But I think it might be hard to do that and so we might double the number of packets chrome sends which might not be ideal. But an experiment could tell us more.

An experiment here would indeed be interesting. I agree that there's a chance that those 404s may increase server loads, but heuristic client-side caching may make that less of an issue (and I'm not sure what's the heuristic server side caching situation, e.g. by CDNs. That could mitigate that concern entirely)

With HTTP/2 and HTTP/3 the server imposes a limit on the number of concurrent streams that the client can open. Historically, this limits the number of concurrent "actual" requests. With this new behavior, these "extra" requests would count against those limits as well.

In practice, I suspect that would add a single request to those limits, so I wouldn't expect this to be a significant change.

Also, getting back non-cachable 404s from every non-origin-policy-using server on the internet is not super-cheap, especially if those 404s are large-ish HTML pages (e.g. the one you can find at https://www.facebook.com/asdf).

I agree that may be expensive.

But I can see browsers make smart decisions towards making this deployment easier:

Heuristically cache Origin Policy 404s beyond regular 404 heuristic caching.
Perform server-side crawls and seed the browser with origins that support OP up until some critical mass is reached.
Start by deploying Origin Policy only for top-level navigation requests and take it from there.
Work with CDNs and hosting providers to make sure Origin Policy 404s are edge-cached.

Note that this would require turning on credentials for the origin policy request if you want to reuse the HTTP/2 connection, but things were already trending in that direction; see #21 and #89. I know @sleevi had some concerns about that direction, though.

That indeed complicates things. Credentialed Origin Policy requests would indeed work for navigation requests, but may work less well for e.g. cross-origin fetch() requests.

The other issue with this approach is that it does not allow any negotiation between individual pages and the origin policy version. That might be OK, but it's hard to say. I'd encourage you to step through each of the examples in https://wicg.github.io/origin-policy/#examples and make sure that this scheme could work with them.

Can you expand on that? Going over the examples, it seems all the negotiation happens with the policy IDs, and not the policy URL (which remains fixed, which is great).

yoavweiss commented 4 years ago

Aren't CDNs ideally positioned to make configuring headers for responses easier?

CNAME CDNs are, as they serve an entire origin's traffic and can control its headers. Non-CNAME CDNs (e.g. Cloudinary) and other 3P providers (e.g. analytics vendors) have no control over the HTML-serving parts of the site, and find it hard to ask their customers to change headers, where modifying a text file at a well-known location may be an easier ask.

WICG / origin-policy

Enabling fast Origin Policy application #96