Open hloeung opened 3 years ago
We use stale-while-revalidate
just to attempt to prevent unnecessary delay on cache expiry for the end user. For this purpose, all we need is for the delta between stale-while-revalidate
and max-age
to be large enough that we would have at least 1 visitor to any given page (for any given cache node) in between the two. Currently the delta is 1 minute (300 vs 360 seconds), which should be sufficient for pages with any significant traffic.
If you want to mask errors, the header to use would be stale-if-error
:
stale-if-error={seconds} Indicates the client will accept a stale response if the check for a fresh one fails. The seconds value indicates how long the client will accept the stale response after the initial expiration.
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control
I'm wary to do this because I'm worried about errors being masked to us, such that we don't realise until some potentially more significant downstream effect of those errors emerges instead (e.g. a page 500s because its operations are too slow/expensive, then those expensive calls compound and bring down the pods altogether).
If we had another robust mechanism for making sure these errors were brought to our attention in a reliable way, then I'd be more than happy to mask them for the end user. However, at the moment we don't have such a reliable system. We have Sentry, but we get so many errors in there that we can't currently treat new errors coming in as an immediate priority, so this doesn't work for this purpose.
I think we'd need an investigation into what would be the best way to achieve a reliable notification system for important errors before we can add stale-if-error
.
Hi,
Currently:
This was added in https://github.com/canonical-web-and-design/canonicalwebteam.flask-base/pull/17, I think it is too low and if/when there are issues with the K8s cluster hosting the various websites, we'll want the content cache to serve objects out of it's cache rather than some error page. I think something like 2-3 hours should be used here.
Or maybe I'm misunderstanding how stale-while-revalidate works for caches fronting backend (e.g. Nginx in our case).