caddyserver / cache-handler

Distributed HTTP caching module for Caddy
Apache License 2.0
235 stars 17 forks source link

Cannot match current iteration key ETag #58

Closed jpds closed 10 months ago

jpds commented 11 months ago

I have this plugin configured as follows:

{
    on_demand_tls {
        ask http://localhost:3903/check
        interval 2m
        burst 5
    }

    order cache before rewrite

    cache {
        allowed_http_verbs GET
        default_cache_control public
        nuts {
            path /var/db/caddy/souin/
        }
        ttl 8h
    }
}

https:// {
    cache {
        timeout {
            backend 30s
        }
    }

    tls {
        on_demand
    }

    reverse_proxy localhost:3902
}

However, every time I hit one of the domains fronted by this configuration, I see this in the logs:

$ curl -v https://www.domain.com/
...
 TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
< HTTP/2 200
< accept-ranges: bytes
< alt-svc: h3=":443"; ma=2592000
< cache-control: public
< cache-status: Souin; fwd=uri-miss; stored; key=GET-https-www.domain.com-/
< content-security-policy: default-src 'self';
< content-type: text/html
< date: Sun, 30 Jul 2023 22:46:54 GMT
< etag: "636849236767b0deb94dd80d04c5bc74"
< last-modified: Thu, 02 Feb 2023 12:56:22 GMT
...

And the ETag never changes between curl runs, however in the logs I then see:

{"level":"info","ts":1690757113.7676728,"logger":"http.handlers.cache","msg":"The key GET-https-www.domain.com-/ didn't match the current iteration key ETag GET-https-www.domain.com-/"}
{"level":"info","ts":1690757115.2038972,"logger":"http.handlers.cache","msg":"The key GET-https-www.domain.com-/ didn't match the current iteration key ETag GET-https-www.domain.com-/"}
{"level":"info","ts":1690757116.2018719,"logger":"http.handlers.cache","msg":"The key GET-https-www.domain.com-/ didn't match the current iteration key ETag GET-https-www.domain.com-/"}
{"level":"info","ts":1690757117.201102,"logger":"http.handlers.cache","msg":"The key GET-https-www.domain.com-/ didn't match the current iteration key ETag GET-https-www.domain.com-/"}
{"level":"info","ts":1690757118.0917957,"logger":"http.handlers.cache","msg":"The key GET-https-www.domain.com-/ didn't match the current iteration key ETag GET-https-www.domain.com-/"}
darkweak commented 11 months ago

Hello @jpds thank you for this feedback and the detailed issue.
AFAIK we're not handling the case the stored response contains an ETag and the client doesn't send one. We will fix that asap.

jpds commented 10 months ago

I built the linked PR with:

$ xcaddy build v2.7.3 --with github.com/darkweak/souin@7cc8ae262d11cc1788e0f2c5974af40d717452d8 --with github.com/caddyserver/cache-handle
...
go: added github.com/darkweak/souin v1.6.40-0.20230810001327-7cc8ae262d11
...
go: added github.com/caddyserver/cache-handler v0.8.0

However, the ETag still does not seem to be cached, I tripled tested this with Firefox/Chromium/curl:

{"level":"debug","ts":1691685939.2154143,"logger":"http.handlers.cache","msg":"Incomming request &{Method:GET URL:/ Proto:HTTP/3.0 ProtoMajor:3 ProtoMinor:0 Header:map[Accept:[text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8] Accept-Encoding:[gzip, deflate, br] Accept-Language:[en-US,en;q=0.5] Alt-Used:[www.domain.com] If-Modified-Since:[Thu, 02 Feb 2023 12:56:22 GMT] If-None-Match:[\"636849236767b0deb94dd80d04c5bc74\"] Priority:[u=1] Sec-Fetch-Dest:[document] Sec-Fetch-Mode:[navigate] Sec-Fetch-Site:[none] Sec-Fetch-User:[?1] Upgrade-Insecure-Requests:[1] User-Agent:[Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/116.0]] Body:0x8656ce1e0 GetBody:<nil> ContentLength:0 TransferEncoding:[] Close:false Host:www.domain.com Form:map[] PostForm:map[] MultipartForm:<nil> Trailer:map[] RemoteAddr:[2a0c:XXXX:XX::XXXX]:59406 RequestURI:/ TLS:0x864de7b80 Cancel:<nil> Response:<nil> ctx:0x86561b560}"}
{"level":"debug","ts":1691685939.2157462,"logger":"http.handlers.cache","msg":"Request cache-control &{MaxAge:-1 MaxStale:-1 MaxStaleSet:false MinFresh:-1 NoCache:false NoStore:false NoTransform:false OnlyIfCached:false StaleIfError:0 Extensions:[]}"}
{"level":"debug","ts":1691685939.215986,"logger":"http.handlers.cache","msg":"The key GET-https-www.domain.com-/ didn't match the current iteration key ETag GET-https-www.domain.com-/"}
{"level":"debug","ts":1691685939.2161043,"logger":"http.handlers.cache","msg":"Request the upstream server"}
{"level":"debug","ts":1691685939.2208962,"logger":"http.handlers.reverse_proxy","msg":"selected upstream","dial":"localhost:3902","total_upstreams":1}
{"level":"debug","ts":1691685939.2755642,"logger":"http.handlers.reverse_proxy","msg":"upstream roundtrip","upstream":"localhost:3902","duration":0.054328866,"request":{"remote_ip":"2a0c:XXXX:XX::XXXX","remote_port":"59406","client_ip":"2a0c:XXXX:XX::XXXX","proto":"HTTP/3.0","method":"GET","host":"www.domain.com","uri":"/","headers":{"Date":["Thu, 10 Aug 2023 16:45:39 UTC"],"Priority":["u=1"],"User-Agent":["Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/116.0"],"Sec-Fetch-Site":["none"],"X-Forwarded-Proto":["https"],"Upgrade-Insecure-Requests":["1"],"Sec-Fetch-User":["?1"],"Accept-Encoding":["gzip, deflate, br"],"Sec-Fetch-Dest":["document"],"X-Forwarded-Host":["www.domain.com"],"Sec-Fetch-Mode":["navigate"],"If-None-Match":["\"636849236767b0deb94dd80d04c5bc74\""],"If-Modified-Since":["Thu, 02 Feb 2023 12:56:22 GMT"],"Accept-Language":["en-US,en;q=0.5"],"Alt-Used":["www.domain.com"],"X-Forwarded-For":["2a0c:XXXX:XX::XXXX"],"Accept":["text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8"]},"tls":{"resumed":true,"version":772,"cipher_suite":4865,"proto":"h3","server_name":"www.domain.com"}},"headers":{"Date":["Thu, 10 Aug 2023 16:45:39 GMT"]},"status":304}

curl always returns:

< cache-control: public, max-age=86400
< cache-status: Souin; fwd=uri-miss; stored; key=GET-https-www.domain.com-/
< date: Thu, 10 Aug 2023 16:55:39 GMT
< etag: "636849236767b0deb94dd80d04c5bc74"
< last-modified: Thu, 02 Feb 2023 12:56:22 GMT
darkweak commented 10 months ago

@jpds I updated the codebase in the PR and added a unit test about this feature. 🤞

jpds commented 10 months ago

@darkweak Thank you very much for the fix. I can confirm that the logs say:

{"level":"debug","ts":1691712666.6366026,"logger":"http.handlers.cache","msg":"The key GET-https-www.domain.com-/ matched the current iteration key ETag GET-https-www.domain.com-/"}

However, every request now seems to request revalidation:

< cache-control: public, max-age=86400
< cache-status: Souin; fwd=request; fwd-status=200; key=GET-https-www.domain.com-/; detail=REQUEST-REVALIDATION
< content-type: text/html
< date: Fri, 11 Aug 2023 00:38:44 GMT
< etag: "636849236767b0deb94dd80d04c5bc74"
< last-modified: Thu, 02 Feb 2023 12:56:22 GMT

Is there a configuration option I'm missing?

Edit: I've also just discovered that the backend I'm using also does caching and here's the logic they use: https://git.deuxfleurs.fr/Deuxfleurs/garage/src/commit/67b1457c77d930cd12f83f5cf07ec7f2376fa00c/src/api/s3/get.rs#L52-L89

darkweak commented 10 months ago

@jpds I updated the debug message to display the current validator (that contains the ETagResponse and the ETagRequest) with that you'll be able to find if there is an ETag mismatch.

jpds commented 10 months ago

None of my client requests will be sending ETags, I believe - here's the new log output:

{"level":"debug","ts":1691742614.1066208,"logger":"http.handlers.cache","msg":"Incomming request &{Method:GET URL:/ Proto:HTTP/2.0 ProtoMajor:2 ProtoMinor:0 Header:map[Accept:[*/*] User-Agent: [curl/8.0.1]] Body:0x8743a4210 GetBody:<nil> ContentLength:0 TransferEncoding:[] Close:false Host:www.domain.com Form:map[] PostForm:map[] MultipartForm:<nil> Trailer:map[] RemoteAddr:[2a0c:XXXX:XX::XXXX]:55244 RequestURI:/ TLS:0x853eea4d0 Cancel:<nil> Response:<nil> ctx:0x874332ed0}"}
{"level":"debug","ts":1691742614.106806,"logger":"http.handlers.cache","msg":"Request cache-control &{MaxAge:-1 MaxStale:-1 MaxStaleSet:false MinFresh:-1 NoCache:false NoStore:false NoTransform:false OnlyIfCached:false StaleIfError:0 Extensions:[]}"}
{"level":"debug","ts":1691742614.1069987,"logger":"http.handlers.cache","msg":"The stored key GET-https-www.domain.com-/ matched the current iteration key ETag &{Matched:true IfNoneMatchPresent:false IfMatchPresent:false IfModifiedSincePresent:false IfUnmodifiedSincePresent:false IfUnmotModifiedSincePresent:false NeedRevalidation:true IfModifiedSince:0001-01-01 00:00:00 +0000 UTC IfUnmodifiedSince:0001-01-01 00:00:00 +0000 UTC IfNoneMatch:[] IfMatch:[] RequestETag: ResponseETag:\"636849236767b0deb94dd80d04c5bc74\"}"}
{"level":"debug","ts":1691742614.1070535,"logger":"http.handlers.cache","msg":"Revalidate the request with the upstream server"}
{"level":"debug","ts":1691742614.1091118,"logger":"http.handlers.reverse_proxy","msg":"selected upstream","dial":"localhost:3902","total_upstreams":1}
{"level":"debug","ts":1691742614.1201108,"logger":"http.handlers.reverse_proxy","msg":"upstream roundtrip","upstream":"localhost:3902","duration":0.010700285,"request":{"remote_ip":"2a0c:XXXX:XX::XXXX","remote_port":"55244","client_ip":"2a0c:XXXX:XX::XXXX","proto":"HTTP/2.0","method":"GET","host":"www.domain.com","uri":"/","headers":{"User-Agent":["curl/8.0.1"],"Accept":["*/*"],"Date":["Fri, 11 Aug 2023 08:30:14 UTC"],"Etag":["\"636849236767b0deb94dd80d04c5bc74\""],"X-Forwarded-For":["2a0c:XXXX:XX::XXXX"],"X-Forwarded-Proto":["https"],"X-Forwarded-Host":["www.domain.com"]},"tls":{"resumed":false,"version":772,"cipher_suite":4865,"proto":"h2","server_name":"www.domain.com"}},"headers":{"Accept-Ranges":["bytes"],"Etag":["\"636849236767b0deb94dd80d04c5bc74\""],"Content-Length":["777"],"Date":["Fri, 11 Aug 2023 08:30:14 GMT"],"Content-Type":["text/html"],"Last-Modified":["Thu, 02 Feb 2023 12:56:22 GMT"]},"status":200}
{"level":"debug","ts":1691742614.1225867,"logger":"http.handlers.cache","msg":"Response cache-control &{MustRevalidate:false NoCache:map[] NoCachePresent:false NoStore:false NoTransform:false Public:true Private:map[] PrivatePresent:false ProxyRevalidate:false MaxAge:86400 SMaxAge:-1 Immutable:false StaleIfError:-1 StaleWhileRevalidate:-1 Extensions:[]}"}
{"level":"debug","ts":1691742614.1228776,"logger":"http.handlers.cache","msg":"Store the response {Status: StatusCode:200 Proto: ProtoMajor:0 ProtoMinor:0 Header:map[Accept-Ranges:[bytes] Alt-Svc:[h3=\":443\"; ma=2592000] Cache-Control:[public, max-age=86400] Content-Length:[777] Content-Security-Policy:[default-src 'self';] Content-Type:[text/html] Date:[Fri, 11 Aug 2023 08:30:14 GMT] Etag:[\"636849236767b0deb94dd80d04c5bc74\"] Last-Modified:[Thu, 02 Feb 2023 12:56:22 GMT] Permissions-Policy:[interest-cohort=()] Referrer-Policy:[no-referrer-when-downgrade] Server:[Caddy] Strict-Transport-Security:[max-age=31536000;] X-Content-Type-Options:[nosniff] X-Frame-Options:[DENY] X-Souin-Stored-Ttl:[24h0m0s]] Body:{Reader:...

All the ETags involved are coming from the backend server - hence I was wondering if there was some mismatched expectations in what the backend is proposing to Souin.

Edit: and the backend in question is a static S3 bucket with simple web content: https://garagehq.deuxfleurs.fr/documentation/cookbook/exposing-websites/ - hence I was hoping that Caddy could cache the latest copy of this content locally without having to ask the backend storage servers what was available.

jpds commented 10 months ago

I've been playing with curl ( https://daniel.haxx.se/blog/2019/12/06/curl-speaks-etag/ ) to try and better understand this and I've found:

With an ETag, curl bounces between REQUEST-REVALIDATION and UNCACHEABLE-STATUS-CODE:

# First attempt
$ curl -v --etag-compare etag.txt --etag-save etag.txt https://www.domain.com -o saved-file
* using HTTP/2
* h2h3 [:method: GET]
* h2h3 [:path: /]
* h2h3 [:scheme: https]
* h2h3 [:authority: www.domain.com]
* h2h3 [user-agent: curl/8.0.1]
* h2h3 [accept: */*]
* h2h3 [if-none-match: ""]
* Using Stream ID: 1 (easy handle 0x55bb424d1350)
} [5 bytes data]
> GET / HTTP/2
> Host: www.domain.com
> user-agent: curl/8.0.1
> accept: */*
> if-none-match: ""
>
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [130 bytes data]
< HTTP/2 200
< accept-ranges: bytes
< alt-svc: h3=":443"; ma=2592000
< cache-control: public, max-age=86400
< cache-status: Souin; fwd=request; fwd-status=200; key=GET-https-www.domain.com-/; detail=REQUEST-REVALIDATION
< content-type: text/html
< date: Fri, 11 Aug 2023 10:00:35 GMT
< etag: "636849236767b0deb94dd80d04c5bc74"
< last-modified: Thu, 02 Feb 2023 12:56:22 GMT
< server: Caddy
< strict-transport-security: max-age=31536000;
< content-length: 777
<
{ [5 bytes data]
100   777  100   777    0     0   3870      0 --:--:-- --:--:-- --:--:--  3885

# Second attempt
$ curl -v --etag-compare etag.txt --etag-save etag.txt https://www.domain.com -o saved-file
* using HTTP/2
* h2h3 [:method: GET]
* h2h3 [:path: /]
* h2h3 [:scheme: https]
* h2h3 [:authority: www.domain.com]
* h2h3 [user-agent: curl/8.0.1]
* h2h3 [accept: */*]
* h2h3 [if-none-match: "636849236767b0deb94dd80d04c5bc74"]
* Using Stream ID: 1 (easy handle 0x55bdb2de0350)
} [5 bytes data]
> GET / HTTP/2
> Host: www.domain.com
> user-agent: curl/8.0.1
> accept: */*
> if-none-match: "636849236767b0deb94dd80d04c5bc74"
>
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [130 bytes data]
< HTTP/2 304
< alt-svc: h3=":443"; ma=2592000
< cache-status: Souin; fwd=uri-miss; key=GET-https-www.domain.com-/; detail=UNCACHEABLE-STATUS-CODE
< date: Fri, 11 Aug 2023 10:07:37 GMT
< server: Caddy
< strict-transport-security: max-age=31536000;
< content-length: 0
<
{ [0 bytes data]
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0

If I use the older if-modified-since method, I then see Souin hits:

$ curl -v -z "11 Aug 2023" https://www.domain.com
* using HTTP/2
* h2h3 [:method: GET]
* h2h3 [:path: /]
* h2h3 [:scheme: https]
* h2h3 [:authority: www.domain.com]
* h2h3 [user-agent: curl/8.0.1]
* h2h3 [accept: */*]
* h2h3 [if-modified-since: Fri, 11 Aug 2023 00:00:00 GMT]
* Using Stream ID: 1 (easy handle 0x563f2a52c350)
> GET / HTTP/2
> Host: www.domain.com
> user-agent: curl/8.0.1
> accept: */*
> if-modified-since: Fri, 11 Aug 2023 00:00:00 GMT
>
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
< HTTP/2 200
< accept-ranges: bytes
< age: 126
< alt-svc: h3=":443"; ma=2592000
< cache-control: public, max-age=86400
< cache-status: Souin; hit; ttl=86274; key=GET-https-www.domain.com-/
< content-type: text/html
< date: Fri, 11 Aug 2023 10:07:35 GMT
< etag: "636849236767b0deb94dd80d04c5bc74"
< last-modified: Thu, 02 Feb 2023 12:56:22 GMT
< server: Caddy
< strict-transport-security: max-age=31536000;
< content-length: 777
<
* The requested document is not new enough
* Simulate an HTTP 304 response
darkweak commented 10 months ago

Hey @jpds sorry for the delay. I just updated the PR, let me know if there are other weird behaviours.

jpds commented 10 months ago

@darkweak Now it is all working - perfectly :tada: . Seeing hits everywhere.