KevinDockx / HttpCacheHeaders

ASP.NET Core middleware that adds HttpCache headers to responses (Cache-Control, Expires, ETag, Last-Modified), and implements cache expiration & validation models
MIT License
271 stars 57 forks source link

ETag does not check whether is still valid #105

Closed scholtz closed 1 year ago

scholtz commented 2 years ago

Hi, we had severe issue with caching running on multiple instances in k8s with this library.

According to https://simonhearne.com/2022/caching-header-best-practices/

This If-None-Match header is a message to the server that the client has a version of the asset in cache. The server can then check to see whether this is still a valid version of the asset - if so, we will receive an empty 304 response with another ETag which will match the original

The server can then check to see whether this is still a valid version.. With this caching library the method was not checking again anything.. If etag is found in the storage, the headers are served from the last run and is not executed.

Therefore we found a solution in setting our own storage which will give us option to set validity of the cache key. This solution is not ideal because if we set cache to 80000 seconds it will probably be refreshed within the day (unix % 80000) is in the cache key, but solves the primary issue.

public class TimedStoreKeyGenerator : Marvin.Cache.Headers.Interfaces.IStoreKeyGenerator
    {
        public static int Validity = 3;// seconds
        public static Func<IServiceProvider, Marvin.Cache.Headers.Interfaces.IStoreKeyGenerator> Instance = serviceProvider => new TimedStoreKeyGenerator();
        public System.Threading.Tasks.Task<Marvin.Cache.Headers.StoreKey> GenerateStoreKey(Marvin.Cache.Headers.Domain.StoreKeyContext context)
        {
            // generate a key to store the entity tag with in the entity tag store
            List<string> requestHeaderValues;

            // get the request headers to take into account (VaryBy) & take
            // their values        
            if (context.VaryByAll)
            {
                requestHeaderValues = context.HttpRequest
                        .Headers
                        .SelectMany(h => h.Value)
                        .ToList();
            }
            else
            {
                requestHeaderValues = context.HttpRequest
                        .Headers
                        .Where(x => context.Vary.Any(h =>
                            h.Equals(x.Key, StringComparison.CurrentCultureIgnoreCase)))
                        .SelectMany(h => h.Value)
                        .ToList();
            }

            // get the resource path
            var resourcePath = context.HttpRequest.Path.ToString();

            // get the query string
            var queryString = context.HttpRequest.QueryString.ToString();

            // generate each 3 seconds new cache key
            var time = (DateTimeOffset.Now.ToUnixTimeSeconds() % Validity).ToString();
            // combine these
            return Task.FromResult(new Marvin.Cache.Headers.StoreKey
            {
                { nameof(resourcePath), resourcePath },
                { nameof(queryString), queryString },
                { nameof(requestHeaderValues), string.Join("-", requestHeaderValues)},
                { "time", time }
            });
        }
    }

Initialized as

public static class StartupExtension
    {
        public static void AddCache(this IServiceCollection services, int duration)
        {
            TimedStoreKeyGenerator.Validity = duration;
            services.AddHttpCacheHeaders(
                (expirationModelOptions) =>
                {
                    expirationModelOptions.MaxAge = duration;
                },

                (validationModelOptions) =>
                {
                    validationModelOptions.MustRevalidate = true;
                    validationModelOptions.ProxyRevalidate = true;
                },
                storeKeyGeneratorFunc: TimedStoreKeyGenerator.Instance
                );
        }
    }

In startup .net 3,5

services.AddCache(3);

.net 6

builder.Services.AddCache(3600 * 20);

If @KevinDockx has any insight or to tell us the best practicies if we missed something please do so.

Thanks

toddb commented 1 year ago

Thanks @scholtz I have had the same problem on my cluster too.

@KevinDockx also just wondering you thoughts.

ps, thanks for the library

KevinDockx commented 1 year ago

First thing to check: does it work when running on one instance, or not? If it does, then the issue most probably lies in the fact that you're using multiple instances with the default value stores (and related): these are in-memory stores. To make them reliably work across servers, you'd need an implementation that doesn't rely on the server where the request ends up at (eg: a distributed cache for storing them, a DB, ...).

toddb commented 1 year ago

Thanks for the response.

First thing to check: does it work when running on one instance, or not?

For me, no.

It is operating as described above such that I have tended to need to send cache-control: no-cache in the request. Also, I have your source code and I can see the InMemoryValidatorValueStoreFacts tests and assume that they need extension to cover such scenarios.

If it does, then the issue most probably lies in the fact that you're using multiple instances with the default value stores (and related): these are in-memory stores. To make them reliably work across servers, you'd need an implementation that doesn't rely on the server where the request ends up at (eg: a distributed cache for storing them, a DB, ...).

Agreed (and want to avoid that!)

KevinDockx commented 1 year ago

Ok, I got some time put aside next Tuesday to investigate this further.

toddb commented 1 year ago

Thanks @KevinDockx. I have had some time to return to this issue to check whether I have a misunderstanding (I noticed #95 and #82 are related). I think I have rushed to a no that is a break in the library. Apologies if I have—everyone is busy.

I am not convinced there is a problem with your library but I agree that it does not implement the CAN in the server can then check to see whether this is still a valid version of the asset. My sense that is going to add complexity that you have been keeping out of the library (and specifically the configuration).

So I suspect I am backtracking because the issue is expiration which you sample StoreManipulationController demonstrates. If you haven't got that right then implementing this approach is probably just compensation.

The benefit I see for adding this is a configuration that avoids putting in place a me to work across multiple instances which you point out.

Finally, if you did add this then I could see that interface change IETagGenerator to allow injection of different etag sources would be good (ie the middleware only passes through the response body as a string but I might want to have saved that eTag previously and want to inject it again—I'm assuming exposing the httpContext or like).

KevinDockx commented 1 year ago

I had a look at this, and I think some of the confusion comes from a misunderstanding of what this package is actually for: this package is not a cache (cfr the readme). It generates cache-related headers and allows caches to check validation/expiration-related logic against the origin server. It's thus made to work together with a server cache in front of it (unless you're only using it for concurrency checks).

The If-None-Match header check as mentioned in the original post is something that must be checked by the cache, which can then potentially validate the request with the server (the API with this middleware in the pipeline).

I'm going to close this as I think it currently works as designed.

FYI: if you're going to use multiple instances, you'll always need store implementations that are aware of that. The default in-memory stores are not sufficient for such a scenario.

Hope this clarifies things a bit! :)