gohugoio / hugoDocs

The source for https://gohugo.io/
Apache License 2.0
1.05k stars 1.48k forks source link

Document HTTP cache config #2593

Closed bep closed 3 months ago

bep commented 3 months ago

@jmooring could I borrow your eyes for a moment.

doc: https://deploy-preview-2593--gohugoio.netlify.app/getting-started/configuration/#configure-http-cache pr: https://github.com/gohugoio/hugo/pull/12523

There are some test failures, but other than that this works pretty good in my tests, but I'm struggling with deciding what the config defaults should be. People use resources.GetRemote for lots of "things" and this is mostly valuable for content sources that changes (e.g. CMS, WordPress ...). Below is what's in the PR, but I'm now leaning towards just disabling everything by default. What do you think?

Also note that not every HTTP server out there implements the HTTP spec properly (Netlify is doing a poor job, see https://answers.netlify.com/t/server-sometimes-responds-with-http-200-even-when-if-none-match-matches-etag/37852/33?u=bep).

[HTTPCache]
  [HTTPCache.cache]
    [HTTPCache.cache.for]
      excludes = ['**{.jpg,.jpeg,.png,.webp,.gif,.ttf}']
      includes = ['**']
  [[HTTPCache.polls]]
    disable = true
    high = '0s'
    low = '0s'
    [HTTPCache.polls.for]
      includes = ['**{.jpg,.jpeg,.png,.webp,.gif,.ttf}', 'https://*.{twitter,x,facebook,instagram}.com/**']
  [[HTTPCache.polls]]
    disable = false
    high = '30m0s'
    low = '10m0s'
    [HTTPCache.polls.for]
      includes = ['https://*.{github}.com/**']
  [[HTTPCache.polls]]
    disable = false
    high = '30s'
    low = '1s'
    [HTTPCache.polls.for]
      includes = ['**']
jmooring commented 3 months ago

@bep You've obviously given this almost infinitely more thought than I have, but I think disabling by default would be safer.

Reasoning:

  1. This will, in some cases, change existing behavior.
  2. I am concerned that someone may unexpectedly hit a rate limit where they didn't before because, by default, the caches.getresource never expires.
jmooring commented 3 months ago

I also think we need to craft a example that users can easily replicate to demonstrate the power of this feature, and why in some cases it's better than setting a unique cache expiration for each resources.GetRemote call. Maybe this is integration with something like Google forms for event registrations, a commenting system, an Instagram feed, etc.

This might be best in a tips and tricks article... not sure.

bep commented 3 months ago

but I think disabling by default would be safer.

Agree. That was my conclusion as well, I guess.

and why in some cases it's better than setting a unique cache expiration for each resources.GetRemote call.

Well, there's 2 aspects of this new feature:

  1. Proper handling of etag and/or last-modified headers.
  2. Polling for changes and trigger partial rebuilds.

Obviously 2) works better with HTTP servers with sensible HTTP cache implementations (avoid having to download 50mb JSON files if nothing changed), but it would work fine with the {{ $cacheKey := print $url (now.Format "2006-01-02") }} strategy.

bep commented 3 months ago

OK, I will merge this PR as part of 0.127.0, should be OK. I will write some technical notes here if someone wants to elaborate: