aliostad / CacheCow

An implementation of HTTP Caching in .NET Core and 4.5.2+ for both the client and the server
MIT License
847 stars 172 forks source link

PUT operations do not invalidate cache (RFC7234 Section 4.4) #258

Closed BrettJaner closed 3 years ago

BrettJaner commented 3 years ago

Hi Ali,

I'd like to bring an issue I'm seeing to your attention. Based on Section 4.4 of RFC7234, I'd expect my client cache to invalidate a resource if I make an unsafe request against that URI. However that's not the behavior I'm experiencing with CacheCow.Client. Here's a quick repro block with it's output:

using System;
using System.Net.Http;
using System.Text;

using CacheCow.Client;

namespace CacheCowBugRepro
{
    class Program
    {
        public static void Main(string[] args)
        {
            using (var client = new HttpClient(
                new CachingHandler
                {
                    InnerHandler = new HttpClientHandler()
                })
            {
                BaseAddress = new Uri("https://chrome-caching-bug-repro.azurewebsites.net")
            })
            {
                var myNumber = client.GetAsync("my-number").Result.Content.ReadAsStringAsync().Result;
                Console.WriteLine($"GET my-number (origin server): {myNumber}");

                myNumber = client.GetAsync("my-number").Result.Content.ReadAsStringAsync().Result;
                Console.WriteLine($"GET my-number (from cache): {myNumber}");

                myNumber = client.PutAsync("my-number", new StringContent("1", Encoding.UTF8, "application/json")).Result.Content.ReadAsStringAsync().Result;
                Console.WriteLine($"PUT my-number (origin server): {myNumber}");

                myNumber = client.GetAsync("my-number").Result.Content.ReadAsStringAsync().Result;
                Console.WriteLine($"GET my-number (still from cache): {myNumber}");
            }
        }
    }
}

Output:

GET my-number (origin server): 0
GET my-number (from cache): 0
PUT my-number (origin server): 1
GET my-number (still from cache): 0

If you deem this to be Pull Request worthy, feel free to tap me for the additional work.

Thanks, Brett

aliostad commented 3 years ago

Hi, this is as far as I am concerned is a missing feature not a bug.

Reality is, the cases where this instance of CacheCow client is the only client making unsafe calls pretty rare in production hence you need to design the cache in a way that either tolerates the cache expiry time or maxage=0 and simply validating all the time.

Point is, even if we invalidate cache on this instance of CacheCow, other clients could be making similar calls and this instance would be none-the-wiser.

Also, this requires that I should be able to clear a large part of my cache based on the URL since I could have different representations of the same resource. A lot of storage mechanisms supported in CacheCow are simple key-value ones and do not allow search feature - currently Redis and File.

In the previous version of CacheCow I had such features with optional interfaces but caused a lot of headache. So I am not going to be able to provide this feature. Sorry

BrettJaner commented 3 years ago

Thanks for the response and I understand that some features aren't worth the work/maintenance. For what it's worth, I'd like to give you a little bit more background on my use case and hopefully get your opinion. We would be tolerating cache expiration between clients. I'm not necessarily worried about updating all clients when a PUT request happens. However if I'm the user that is doing the PUT request, I'd like my own cache to be up to date so I can have confirmation that my PUT operation was successful. Does this use case sound reasonable and worth pursuing, or should I be doing something else?

aliostad commented 3 years ago

Thanks for elaborating.

What I am still not sure if this is a test scenario or a production case. As I said, for a production scenario, the call from other clients will be a concern unless it is a single client running on a single node in which case probably the caching itself might not matter (I have been using this with dozens of clients where the throughput needed was ~2000 RPS).

So it would be good to know either way and I should be able to help.

BrettJaner commented 3 years ago

This use case is something I was planning to do in production. Even in the production scenario, why would the cache being stale for other clients be a concern, when I'm OK with their cache being stale? What am I missing? The only reason I want the local cache being invalidated for the single client, is so that I can provide feedback to that client that their update was successful.

For example, say my web API that I'm applying the caching to is serving a SPA browser application. The application works similar to a spreadsheet, lots of data on a page that's editable. Now if I edit a page, I'd like my cache to be invalidated so that when I refresh the page, I can see my updates. However, I don't necessarily care that other clients don't see my changes immediately.

aliostad commented 3 years ago

In your particular case, I do not see a reason for caching. A single client, making a call, updating something that it already knows has been updated, hence the cache of the spreadsheet is on the client anyway with its latest state - it does not need HTTP caching. Also if you are saying that it needs to reload its state clean, then it is a single call, use no-cache in your request which triggers validation. The server knows if it has change since and either returns the new data and refreshes your cache or return 304.

I am afraid it is not a typical scenario where CacheCow is used or I would say HTTP Caching in general.

As for the HTTP Spec, that statement is geared towards HTTP cache appliances rather than libraries - as for the reasons I explained.

Hope this helps.

BrettJaner commented 3 years ago

Also if you are saying that it needs to reload its state clean, then it is a single call, use no-cache in your request which triggers validation. The server knows if it has change since and either returns the new data and refreshes your cache or return 304.

CacheCow.Client doesn't seem to support the no-cache request header. When adding the no-cache header, CacheCow.Client still serves the response from cache in my scenario. This might be easy enough to add this functionality by extending the CachingHandler. Do you think something like this is safe to do?

    public class ExtendedCachingHandler : CachingHandler
    {
        public ExtendedCachingHandler()
        {
            var responseValidator = ResponseValidator;

            ResponseValidator = response =>
            {
                var result = responseValidator(response);

                if (result == ResponseValidationResult.OK
                    && response.RequestMessage.Headers.CacheControl != null
                    && response.RequestMessage.Headers.CacheControl.NoCache == true)
                {
                    return ResponseValidationResult.MustRevalidate;
                }

                return result;
            };
        }
    }
aliostad commented 3 years ago

Wow, that's a bug! I am surprised it had not been raised earlier. Would you mind creating a new github issue for it? Thanks a lot