US-GHG-Center / veda-config-ghg

Veda config for GHG
https://ghg-demo.netlify.app
Other
3 stars 15 forks source link

Disable S3 Cache during the deployment #280

Closed amarouane-ABDELHAK closed 7 months ago

amarouane-ABDELHAK commented 8 months ago

Why are you creating this Pull Request?

We are disabling S3 caching to force the web application to fetch the latest assets from S3 (we still need to ensure there is no caching in the cloudfront side) Pros:

Cons:

netlify[bot] commented 8 months ago

Deploy Preview for ghg-demo ready!

Name Link
Latest commit 04f2084e6410d85443217757bcae34a698963abe
Latest deploy log https://app.netlify.com/sites/ghg-demo/deploys/65c3a346786939000801f835
Deploy Preview https://deploy-preview-280--ghg-demo.netlify.app/
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

amarouane-ABDELHAK commented 8 months ago

I have emulate the changes and this approach worked in staging without any problems Screen Shot 2024-01-31 at 4 14 34 PM

edkeeble commented 8 months ago

The proposed change here will disable caching at both the browser>cloudfront layer and cloudfront>s3 layer, but I think you could probably find a middle ground where you retain the caching benefits of cloudfront while still minimizing stale content for users.

Rather than using no-cache, I would suggest leveraging the s-maxage directive to tell Cloudfront to cache files from S3 for an extended period (maybe even the 7 days they're currently cached for) and setting max-age (which would then control only browser caching) to a significantly lower number like 3-5 minutes. This approach would have the following effect:

  1. Browsers would cache objects for at most 3-5 minutes before re-requesting them from Cloudfront, so unless you are deploying very frequently, your content shouldn't be stale for long.
  2. Cloudfront will rarely hit the S3 origin to retrieve new versions of a file, so you avoid the latency and cost of always hitting S3. The potential downside here is the content becomes stale in the edge cache, but you can mitigate this by: a. Cache-busting filenames for all assets (you already seem to be doing this) b. Invalidating any files which can't be cache-busted (such as index.html) in cloudfront, forcing a refresh

The trick to the above approach is you can't just invalidate all files, as you're currently doing, because you're also deleting old files with aws s3 sync, so that would put you in a bad state for the 3-5 minutes that the old index.html file is cached in the browser. Here's how that would go:

  1. User has old index.html cached in browser and it hasn't expired. This file references the filenames of old assets, which have now been deleted from S3.
  2. Browser requests old assets from cloudfront. If they've been invalidated, CF goes to S3 to retrieve them and returns a 404.
  3. If the old assets haven't been invalidated, CF returns the cached versions of those files and retains them until they expire, so the old content is still rendered correctly up until the browser decides to request a new version of index.html (3-5 mins later at most).

There is still an outside chance that you get a 404 on an old asset if it happens to expire in the CF cache within a 3-5 minute window of making a deployment, but that seems pretty unlikely and you could make it even more unlikely by further increasing the s-maxage directive. If you really wanted to eliminate this possibility, you would need to retain old assets in s3 for a minimum of one deployment.

edkeeble commented 8 months ago

If you really wanted to minimize requests to cloudfront, you could set different max-age values for cache-busted files vs non-cache busted files. Realistically, a cache-busted file can be cached forever, since it will never change, so you would only need to set a low max-age for files like index.html.

amarouane-ABDELHAK commented 8 months ago

@edkeeble thank you so much for your valuable inputs, I liked the idea of a middle ground. Your comments expired me to think about another idea, we would keep the max-age as it was before, we would add the s-maxage directive (thanks again for suggesting this, I totally forgot about it) and we will need the other missing directive the must-revalidate so the browser will revalidate any specific file when it stales. I tested this solution locally and it seemed to work --cache-control max-age=604800,must-revalidate,s-maxage=604800

edkeeble commented 8 months ago

@amarouane-ABDELHAK Won't must-revalidate only have an effect once the resource is stale (ie when max-age has elapsed)? In this case, you would still be caching index.html locally for one week, no?

amarouane-ABDELHAK commented 8 months ago

@amarouane-ABDELHAK Won't must-revalidate only have an effect once the resource is stale (ie when max-age has elapsed)? In this case, you would still be caching index.html locally for one week, no?

What I observed is it does revalidate the cache even before the max-age elapsed. I will do more testing...

edkeeble commented 8 months ago

~But aren't you trying to solve for a case where the browser caches the index file for too long? In that case, must-revalidate won't help because the index file won't have become stale in the browser cache yet.~

~Sorry, just re-read your comment and saw you said "before max-age elapsed". That isn't what I've observed. I put a quick test together and even with must-revalidate, the browser uses the cached version of a file until max-age has elapsed. This matches what the docs about that directive. Otherwise, must-revalidate would be effectively like no-cache, wouldn't it?~

Forgive my evolving understanding of must-revalidate. After testing some more, it looks like this is what's happening when must-revalidate is set:

  1. If the age of a resource exceeds max-age, the browser will always re-fetch it from cloudfront, regardless of whether it was recently fetched. So if you refresh a couple of times after max-age has been exceeded, the browser will hit cloudfront every time.
  2. The age is set by cloudfront based on the last time it retrieved the resource from S3, which is controlled by s-maxage

So for our use case, let's imagine an index.html file with max-age=30,must-revalidate,s-maxage=60.

So given that, if max-age and s-maxage are both set to 604800, stale content could still remain cached in the browser for up to 7 days, though it becomes a fair bit more complex, since age is set based on the last time cloudfront fetched the file from s3. I still think we need a shorter max-age to guarantee the browser will check with cloudfront more regularly.

amarouane-ABDELHAK commented 8 months ago

Sorry for the quality of the video but this what I observed consistently after so many retries https://github.com/NASA-IMPACT/veda-config-ghg/assets/33136280/07dce976-112a-47f2-8962-98d6759c6726

amarouane-ABDELHAK commented 8 months ago

We can shorten the cache age, I will update the PR

edkeeble commented 7 months ago

I think you could still set s-maxage pretty high in order to leverage the edge cache, especially since you can invalidate files from the CF cache on each deployment.

amarouane-ABDELHAK commented 7 months ago

Yes, I thought about it some more, and since we invalidate the cloudfront everytime we deploy, we should be safe. I've set the value to 7 days (same value in documentation )