aliostad / CacheCow

An implementation of HTTP Caching in .NET Core and 4.5.2+ for both the client and the server
MIT License
847 stars 172 forks source link

Client cached response content unable to read more than once #267

Closed alexheretic closed 2 years ago

alexheretic commented 2 years ago

Hello, using client caching I'm unable to read a cached response body stream. It was already read during the initial uncached request.

var response = await client.GetAsync(url);
// works if response is not cached, empty if response is cache
var body = await response.Content.ReadAsByteArrayAsync();

What's the correct way to re-read cached response body bytes?

aliostad commented 2 years ago

Thanks for raising.

Can you please provide a repro? Obviously this information is not enough to be able to investigate. The behaviour you see is wrong and you should be able to read regardless.

alexheretic commented 2 years ago

Here is a fuller reproduction

// <PackageReference Include="CacheCow.Client" Version="2.9.0" />

using System;
using System.Threading.Tasks;
using CacheCow.Client;

static class Program
{
    static void Main(string[] args)
    {
        var url = "https://webhooks.truelayer-sandbox.com/.well-known/jwks";
        var client = ClientExtensions.CreateClient();

        Task.Run(async () =>
        {
            // fetch for the first time
            var response = await client.GetAsync(url);

            var body = await response.Content.ReadAsByteArrayAsync();
            Console.WriteLine($"body.Length = {body.Length}");

            // later fetch again from cache
            var response2 = await client.GetAsync(url);

            var body2 = await response2.Content.ReadAsByteArrayAsync();
            Console.WriteLine($"body2.Length = {body2.Length}");

        }).GetAwaiter().GetResult();
    }
}

I'm running on net6.0 on Linux

$ dotnet run
body.Length = 718
body2.Length = 0

Interestingly if I run with var url = "https://code.jquery.com/jquery-3.3.1.slim.min.js" it seems to work fine

$ dotnet run
body.Length = 69917
body2.Length = 69917

Is it related to the headers of the jwks response? The headers should not be indicating that the response is not cachable. And even if they did I would then expect the client to just not cache, rather than silently provide an empty body in this way.

jwks response headers:

HTTP/2 200 
date: Tue, 22 Mar 2022 14:22:01 GMT
x-tl-correlation-id: 18919084-b81c-4212-abfe-aa798dd8829b
vary: accept-encoding
cache-control: public, max-age=86400
referrer-policy: no-referrer
x-frame-options: deny
x-content-type-options: nosniff
x-xss-protection: 1; mode=block
strict-transport-security: max-age=63072000; includeSubDomains
cf-cache-status: DYNAMIC
expect-ct: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
server: cloudflare
cf-ray: 6eff93b9bdd054e1-MAN
aliostad commented 2 years ago

Thanks! I should have enough to look into this 👍

aliostad commented 2 years ago

I am working on it and made some progress. It appears that it happens during serialisation/deserialisation of the response to be stored in cache/reading out of cache but why/when/how it happens I am still trying to figure out.

One key point seems to be chunked encoding. I wonder if serialisation/deserialisation fails when the original response was chunked encoded. I have a feeling MS cowboys have broken another thing perhaps trying to improve performance of chunked encoded messagess.

aliostad commented 2 years ago

Wow, this is a terrible bug in Microsoft's code - perhaps a behaviour changed recently but not sure. I have no confidence in them fixing it since it touches an area used probably only in CacheCow hence they would just simply ignore. Last time I raised a bug it took them 3 years to fix.

But I have found the workaround which will be added in the library in the next version which I will release in the next couple of days.

The issue with the response coming from your URL is this:

1) Chunked encoding: this trips up the serialisation. The workaround is to access ContenLength header as soon as receiving the response - it is crazy right?! But it results in the framework to perhaps prefetch the response. 2) No Content-Type header: this is really not right and I believe the reason I had not seen it perhaps because all endpoints would spit this header - although not absolutely mandatory by HTTP spec. I have added a workaround which adds a ContentType header if one does not exist.

My tests now pass. I shall publish the new version (2.10.0) in a couple of days when I get round to it.

Cheers

aliostad commented 2 years ago

Version 2.10.0 was released which should fix the issue. Please let me know if you still experience issues.

alexheretic commented 2 years ago

I tested v2.10.0 and everything looks good, thanks!