Open IchordeDionysos opened 3 weeks ago
This is not an issue with Express and would happen with any HTTP framework, as the CDN caches the headers & content.
If I understood this correctly, your assumption is that GET requests from different origins should return different headers, if that is true then consider in your implementation:
Access-Control-Allow-Origin: *
instead of origin if the origin is allowed to access the resourceIf I understood this correctly, your assumption is that GET requests from different origins should return different headers.
Yes that's the assumption and the solution to handle caching in that case would be to return a Vary: Origin
header, which we do but in connection with other standards this doesn't play nicely together.
Different routes for different origins (Or query parameters) → This is not a very clean solution ... as it will make the usage for consumers more complicated while it could be solved on the server-side with different ETags.
Cookie-based caching → This won't work as a CDN will heavily speed up the application, separate requests by cookie would defeat the purpose of the cache.
Make CORS requests return Access-Control-Allow-Origin: instead of origin if the origin is allowed to access the resource → For security reasons this is not a good solutions, especially if you need to use credentials in requests. Returning `access-controll-allow-origin: ` will be prohibited.
No CDN cache on these cases, handle the cache in the application → Again this would not be a good solution as CDN cache speeds up the application significantly!
This is not an issue with Express and would happen with any HTTP framework, as the CDN caches the headers & content.
Regarding this comment. It is indeed expected and not the issue that the CDN caches the headers and the content.
It is particularly this combination of the browser using a different ETag from a request cached for a different origin that doesn't play nicely with the feature.
I'd say remove the ETag from those responses. Either using middleware or disable for all.
Add The “Vary: Origin” header for security: https://www.google.com/search?q=Vary+cross-origin+header
When a browser makes a CORS request, it automatically adds an “Origin” header to indicate the source of the request. The server must check this header and add an “Access-Control-Allow-Origin” header to indicate which sources can access the requested resource.
If a request may contain a Access-Control-Allow-Origin with different values, then the CDN should always respond with Vary: Origin, even for responses without an Access-Control-Allow-Origin header.
If the header isn't always present, it would be possible to fill the cache with incorrect values.
My two cents.
From my own dive into the RFCs, this is a nasty problem with how the 304 response is defined and how headers work in HTTP caches, including caches inside web browsers:
MUST
generate in a 304 and states that it SHOULD NOT
send any others "unless said metadata exists for the purpose of guiding cache updates".
Content-Location
, Date
, ETag
, Vary
, Cache-Control
, and Expires
.Access-Control-Allow-Origin
or Access-Control-Allow-Certificate
. Does not, in any way, refer to headers implicated by the Vary
field.'Access-Control-Allow-Origin
, so it must be cached.Vary
headers and some of which don't, it should prefer responses with a "valid" Vary field value, making the responses with no such header functionally useless in the cache.The upshot to all this:
Access-Control-Allow-Origin
header; thus, an intermediate cache receiving updated Access-Control-Allow-Origin
headers during successful revalidation, and subsequently sending a 304 of its own, is expected not to propagate that updated headerAccess-Control-Allow-Origin
headers should rewrite the header stapled to cached prior requests from other originsTherefore, ACAO with values other than *
can only successfully transit caches if every single cache along the way is strategically violating the spec:
Vary: origin
requests with different origins, even when the validators matchIf it doesn't do both of these things, then at some point a downstream client is going to process a response with an ACAO header that doesn't match the Origin and treat this as a security error. If the ACAO headers get dropped along the way, a request coming in from a new-but-acceptable origin will not see the new ACAO response specifying the origin from the request. If the ACAO headers get propagated, and written back to cached rows as the spec demands, then we get similarly broken results from a sequence of requests: request from origin A, miss cache, receive 200 (cache it), propagate response; request from origin B, revalidate, receive 304 (additional cache row sharing content, update headers), patch cached body onto new ACAO header; request from origin A, hit cache, respond with the cached response and headers -- which now includes the ACAO header from origin B because the cache must use the most recent headers.
It's not hard to imagine a way to do this "right": always send the ACAO header and don't cross-update headers for other cache rows pertaining to the same request path when those requests varied in some relevant dimension (according to the Vary
header). This violates the spec but works as intended. If every cache along the way chooses to violate the spec in this way, ACAO works as intended.
Patching a hash of the ACAO
header onto the end of the Etag -- or similar -- avoids header cross-contamination by, morally, treating the ACAO header as part of the content, thereby forcing unchanged content to re-transfer as an undesirable side effect of HTTP caching semantics, where caches are encouraged to strip relevant headers and required to cross-update headers based on resource validators that specifically do not cover the header.
Removing the ETag does not help. Per RFC 9111 4.3.1, the cache is allowed to use timestamps instead, even when a Vary
header should make everyone involved kind of sus that timetsamps don't tell enough of the story. As far as the spec is concerned, if the server wanted to be clear that the timestamp isn't enough to identify a content revision, it should use Etag...
Patching a CDN to emit ACAO headers on a 304 response does help but if there are more caches along the way (caching proxies, end-user browser caches), those caches are free to exhibit the same behaviors.
TL;DR: I want the ETag to change when the
access-control-allow-origin
header is different but the response body is the same.We are facing an issue right now with a couple moving parts working poorly together.
It's a bit difficult to explain but I'll try my best.
This is kind of messed up, as all parties do comply with the specs, though when put together, the specs don't work nicely together.
The best way to fix this would be for the server to generate a different ETag when the
access-control-allow-origin
header differs.