caddyserver / cache-handler

Distributed HTTP caching module for Caddy
Apache License 2.0
235 stars 17 forks source link

Disable Request Coalescing #82

Open DzigaV opened 1 month ago

DzigaV commented 1 month ago

I have some endpoints that will serve a mix of cacheable and uncacheable content. I don't want to run into any performance or security issues caused by request coalescing. How do I disable it?

darkweak commented 1 month ago

Hey @DzigaV afaik thats not possible.
The request coalescing will request once your upstream for a given key, so if your generated keys are equal, it should return the same data right?

DzigaV commented 1 month ago

I have a very particular use-case. Sometimes for a given key, the backend will return a static, cacheable response. Sometimes for the same key, the backend will return a dynamic, private response, with Cache-Control "no-cache, no-store".

This is what I've done in nginx for years. I was just reading Souin docs when I realised you have request coalescing on by default, which isn't something I'd want enabled, given security and performance pitfalls.

darkweak commented 1 month ago

I'm wondering if the coalescing system will send only one request to the backend if an authenticated user and an anonymous user try to send the request on the same endpoint (just having an Authorization HTTP request header on the authenticated user). I'll have to do some tests.

Btw, if the response is private, the cache won't store it.

DzigaV commented 1 month ago

CDNs acknowledge the pitfalls of coalescing, even when enabled by default. Various solutions are very complicated, but the best one is to disable it.

I'm wondering if the coalescing system will send only one request to the backend if an authenticated user and an anonymous user try to send the request on the same endpoint (just having an Authorization HTTP request header on the authenticated user). I'll have to do some tests.

The presence or absence of a particular header is a red herring. I need the origin to dictate what is and isn't cached, regardless of who the client is. Here is a hypothetical scenario that reflects my use-case:

Let's say I have a public asset (/blog/1), that I want to cache for 60 minutes. It's stored in the Souin cache (TTL 60m), which serves all subsequent requests for the next 60 mins. In the meantime, I:

a) make this blog post available only to authenticated users. b) generate different versions of this blog post for each of those users.

After the cache entry expires, multiple simultaneous requests are received for /blog/1. One from an unauthenticated visitor, another from a user whose credentials are no longer valid, one from User A and another from User B. You get the picture.

What does Souin do? Has it identified /blog/1 as a target for coalescing, because it was previously cached? Is it going to send the first request to the backend, then serve the same response to all coalesced requests, even if the backend returns no-cache headers? Obviously that would be a security disaster.

What if, given the no-cache header, Souin decides not to serve that response to all clients. Does it unlock all the other requests and then pass them to the backend? That sounds like a performance nightmare. How is this queue handled?

Then, to complicate things even further, all personalised versions of /blog/1 are deleted after a period of time. It once again becomes a static, public, cacheable asset, for the next 60 minutes. Does Souin refuse to cache it? Again, this would be a performance nightmare.

Personally, I don't think request coalescing should be default behaviour for any cache. In nginx, proxy_cache_lock is off by default.

But, I get it. It's a sexy performance feature. I'm sure most people are happy with it. But please give us a way to disable it. Otherwise I'm going to have to run nginx as a sidecar just to get simple proxy caching, which defeats the whole point of using Caddy.

darkweak commented 1 month ago

I'm getting it, BUT

I heard you about a configuration key to disable the coalescing system but the performance issues will be greater than the actual system without the coalescing system. I think what you're trying to achieve could be done by configuring your app responses with a Vary header or using the header_down subdirective.

IMHO your use case is very specific and we must discuss about all the options we have.

DzigaV commented 4 weeks ago

Thank you for looking at this @darkweak. I appreciate you spending the time. I don't use the Authorization header and I can't rearchitect an application to suit coalescing. I don't want unique cache keys per user. There are no performance downsides to disabling coalescing for me.

What I do sounds quirky, but it's been in production on nginx for 10 years. But my particular use-case is a distraction I think. Asset cacheability, access policies and implementation details all change. That's what really matters.

Nginx has no coalescing by default. Varnish lets you disable it, likewise the CDNs (using Varnish or otherwise). They point out the severe performance (and possibly security) issues with coalescing, when asset cacheability is changeable. Uncacheable requests being queued and executed in series, for example.

To force coalescing is dictating behaviour to the origin inappropriately imo. Can you help a brother out and create a global option to disable it, otherwise I'm f***ed, not to put too fine a point on it.

darkweak commented 2 weeks ago

@DzigaV can you try with xcaddy build --with github.com/darkweak/souin/plugins/caddy@46601ccfb6a445669bfad6665c5139c671b962d0 --with github.com/darkweak/souin@46601ccfb6a445669bfad6665c5139c671b962d0 and tell me if that is working as expected please?

DzigaV commented 2 weeks ago

You legend. I will try it as soon as I can.

DzigaV commented 2 weeks ago

@darkweak Looks to be working. No more reused responses.