aws-amplify / amplify-hosting

AWS Amplify Hosting provides a Git-based workflow for deploying and hosting fullstack serverless web applications.
https://aws.amazon.com/amplify/hosting/
Apache License 2.0
448 stars 113 forks source link

Question regarding cache-control header when using _next/image and Cloudfront #2376

Open JFDontigny opened 2 years ago

JFDontigny commented 2 years ago

Please describe which feature you have a question about?

I have a question regarding cache-cpontrol header behavior for a Next.js site with SSR, specifically for images served by the Next.js image component (_next/image) through Cloudfront

Provide additional details

While investigating performance on our site, we noticed that some images would load very slowly (800-1200ms for a small image), even though they were being served through Cloudfront. Inspecting the request in the browser, we noticed the response header contained the following:

Sometimes, we would get a hit from Cloudfront, but most of the time not, even on quick successive loads. So, we then compared with as close a setup as we could in a local environment, and the cache control header behaves very differently.

The images in question use Next.js Image component to load dynamic images, specified by a headless CMS during build. The images themselves are hosted in S3, and the build process simply builds with the s3 URLs. As for the next/image setup, it is nothing crazy and uses the default loader. We want to use it for the srcset, resizing and other functionality it provides.

As such, in prod, the URL pattern for these images is the following: https://growth.design/_next/image?url=https%3A%2F%2Fs3.amazonaws.com%2Ffiles.growth.design%2Fcms%2Fcktlzgroz000l0umw6p4sgpf1.png&w=384&q=75

Also, these images do not change over time, and so are good candidates for aggresive caching. We do add new ones regularly (hence the headless CMS), but once they are live they very rarely change.

So basically, the behavior we would like to see would be for Cloudflare to hit the lambda@edge function once, then cache for a day. After a lot of tests, we don't know how to best achieve this using Amplify and Cloudflare. We did play with the next.config.js, and added a customHttp.yml file. While these did end up setting the cache-control header, it did not seem to make Cloudflare keep the object.

We did find a workaround that made the image loads way faster and that was to set a custom cache policy on the _next/image pattern. The cache policy just sets a minimum TTL of an hour and a max ttl of a day (with a default of one hour, and the appropriate query params). Even there, while Cloudfront does respond directly without forcing the lambda@edge execution, the cache control header is still public, max-age=0, s-maxage=600. Also, perhaps the biggest issue with our workaround is that the custom cache policy is reset every time a deployment happens, and so we must be careful not to forget to re-add it. And the performance benefit of adding it makes the page load twice as fast...

So, basically, where is the cache control header specified for these images? Is it from Cloudflare, the Lambda@edge function, Next.js image loader? And how can we change it?

What AWS Services are you utilizing?

Provide additional details e.g. code snippets

Example file: https://growth.design/_next/image?url=https%3A%2F%2Fs3.amazonaws.com%2Ffiles.growth.design%2Fcms%2Fcktlzgroz000l0umw6p4sgpf1.png&w=384&q=75

This is the page where this is happening: https://growth.design/case-studies

github-actions[bot] commented 2 years ago

Hi :wave:, thanks for opening! While we look into this...

If this issue is related to custom domains, be sure to check the custom domains troubleshooting guide to see if that helps. Also, there is a more general troubleshooting FAQ that may be helpful for other questions.

Lastly, please make sure you've specified the App ID and Region in the issue!

ghost commented 2 years ago

Hi @JFDontigny👋🏽 You can achieve this through the console by adding a custom header: Setting Custom Headers. If you've already done that, can you confirm that Cache-Control request headers aren't being sent by the client? If so, Amplify will honor those requests headers over custom headers you've set in the console.

Ref: Using headers to control cache duration

JFDontigny commented 2 years ago

I tried setting a custom header in the console directly as well as though the customHttp.yml file, and was getting weird behaviour as described above.

Also, the tests were performed on a private browsing session with disable cache not checked in the browser. As such the requests did not send the cache-control header.

By setting the cache policy in CF, I get a much older object, one that clearly comes from CF and not lambda@edge. For example, I just got a response with the age header of 79845, even though the response cache-control has a maxage of 600s. This is fine, since the cache policy has a a minimum TTL of 86400s. CF clearly does not call lambda@edge in this scenario.

When I set the cache-control either in the console or the config file, even when setting it up with both max-age and s-maxage with very high values (e.g. 86400), and with public and immutable, the response gets sent to the client, but as far as I can tell, CF still tries to call the lambda@edge function is each subsequent call for the same resource.

The behavior I would expect would be for CF to see the cache-control directive as part of the response, especially the s-maxage + public combination, and use that as a directive the object should be stored and reused for subsequent calls, to avoid calling the function each time.

Is that supposed to be the case? It is possible that when CF "connects" the incoming call with the lambda function, that somehow it doesn't parse the cache-control in the response, and as such misses the s-maxage header? From the point of view of CF, it doesn't seem like the response cache-control header is being understood at all.

Also, to be clear, when the client is sending cache-control or related headers (e.g. if-none-match), then CF and the lambda function do honor the headers, returning for example a 304.

Thanks!

juancarloselorriaga commented 1 year ago

How did you end up handling the cache?

I'm having a similar issue, my images are all stored in Google Storage, and I pass the direct URL to the src of the next/image component. The images take a long time to appear and even though they have a cache, they're never really cached, the request is made again if I refresh the page. I've set no custom headers yet because I'm not entirely sure which would be the path for the images. I'm not sure if this could be an issue with my Next Image configuration or because I'm fetching directly to Google Storage instead of a CDN. It just seems to me that images are not being cached at all.

I would appreciate a little help here because I'm stuck!

HemantDubey-ACS commented 2 weeks ago

Do we get any solution, we are also stuck in same scenario?