expressjs / express

Fast, unopinionated, minimalist web framework for node.
https://expressjs.com
MIT License
65.46k stars 16.04k forks source link

ETag should consider CORS headers #5986

Open IchordeDionysos opened 3 weeks ago

IchordeDionysos commented 3 weeks ago

TL;DR: I want the ETag to change when the access-control-allow-origin header is different but the response body is the same.


We are facing an issue right now with a couple moving parts working poorly together.

It's a bit difficult to explain but I'll try my best.

image
  1. When sending a request from Chrome to a spec-compliant CDN, it will initially hit the express server.
    1. The express server responds with the resource, CORS & Vary headers, and an ETag derived from the response body.
    2. The CDN will cache the response headers and body.
    3. Chrome will also cache the response headers and body.
  2. When Chrome requests the resource from a different origin, it may take the ETag cached for a different Origin and send it to the CDN (this is spec-compliant).
    1. This may then lead the CDN to respond with a 304 Not Modified, as the CDN may have a resource for the origin and ETag in the cache.
    2. Responding with 304 will only return partial headers (explicitly excluding CORS headers; Some spec says it should NOT return CORS headers here).
  3. This then leads Chrome to take the response from the cache and think that the Origins mismatch → throwing a CORS error.

This is kind of messed up, as all parties do comply with the specs, though when put together, the specs don't work nicely together.

The best way to fix this would be for the server to generate a different ETag when the access-control-allow-origin header differs.

Sealos commented 3 weeks ago

This is not an issue with Express and would happen with any HTTP framework, as the CDN caches the headers & content.

If I understood this correctly, your assumption is that GET requests from different origins should return different headers, if that is true then consider in your implementation:

IchordeDionysos commented 3 weeks ago

If I understood this correctly, your assumption is that GET requests from different origins should return different headers.

Yes that's the assumption and the solution to handle caching in that case would be to return a Vary: Origin header, which we do but in connection with other standards this doesn't play nicely together.

IchordeDionysos commented 3 weeks ago

This is not an issue with Express and would happen with any HTTP framework, as the CDN caches the headers & content.

Regarding this comment. It is indeed expected and not the issue that the CDN caches the headers and the content.

It is particularly this combination of the browser using a different ETag from a request cached for a different origin that doesn't play nicely with the feature.

NewEraCracker commented 3 weeks ago

I'd say remove the ETag from those responses. Either using middleware or disable for all.

Add The “Vary: Origin” header for security: https://www.google.com/search?q=Vary+cross-origin+header

When a browser makes a CORS request, it automatically adds an “Origin” header to indicate the source of the request. The server must check this header and add an “Access-Control-Allow-Origin” header to indicate which sources can access the requested resource.

If a request may contain a Access-Control-Allow-Origin with different values, then the CDN should always respond with Vary: Origin, even for responses without an Access-Control-Allow-Origin header.

If the header isn't always present, it would be possible to fill the cache with incorrect values.

My two cents.

AdamNorberg commented 2 weeks ago

From my own dive into the RFCs, this is a nasty problem with how the 304 response is defined and how headers work in HTTP caches, including caches inside web browsers:

The upshot to all this:

Therefore, ACAO with values other than * can only successfully transit caches if every single cache along the way is strategically violating the spec:

If it doesn't do both of these things, then at some point a downstream client is going to process a response with an ACAO header that doesn't match the Origin and treat this as a security error. If the ACAO headers get dropped along the way, a request coming in from a new-but-acceptable origin will not see the new ACAO response specifying the origin from the request. If the ACAO headers get propagated, and written back to cached rows as the spec demands, then we get similarly broken results from a sequence of requests: request from origin A, miss cache, receive 200 (cache it), propagate response; request from origin B, revalidate, receive 304 (additional cache row sharing content, update headers), patch cached body onto new ACAO header; request from origin A, hit cache, respond with the cached response and headers -- which now includes the ACAO header from origin B because the cache must use the most recent headers.

It's not hard to imagine a way to do this "right": always send the ACAO header and don't cross-update headers for other cache rows pertaining to the same request path when those requests varied in some relevant dimension (according to the Vary header). This violates the spec but works as intended. If every cache along the way chooses to violate the spec in this way, ACAO works as intended.

Patching a hash of the ACAO header onto the end of the Etag -- or similar -- avoids header cross-contamination by, morally, treating the ACAO header as part of the content, thereby forcing unchanged content to re-transfer as an undesirable side effect of HTTP caching semantics, where caches are encouraged to strip relevant headers and required to cross-update headers based on resource validators that specifically do not cover the header.

Removing the ETag does not help. Per RFC 9111 4.3.1, the cache is allowed to use timestamps instead, even when a Vary header should make everyone involved kind of sus that timetsamps don't tell enough of the story. As far as the spec is concerned, if the server wanted to be clear that the timestamp isn't enough to identify a content revision, it should use Etag...

Patching a CDN to emit ACAO headers on a 304 response does help but if there are more caches along the way (caching proxies, end-user browser caches), those caches are free to exhibit the same behaviors.