dotnet / aspnetcore

ASP.NET Core is a cross-platform .NET framework for building modern cloud-based web applications on Windows, Mac, or Linux.
https://asp.net
MIT License
35.19k stars 9.93k forks source link

Allow Response Caching for Authorized Endpoints #56769

Open 6pac opened 2 months ago

6pac commented 2 months ago

Background and Motivation

When designing intranet based web enabled database systems, which always require authorization and are accessed from secure company computers using an internal IP address, it would greatly enhance speed to be able to cache some page content. There are often long lists of data, frequently 10+Mb of data, that changes infrequently and must currently be added to every page load. Sometimes these apps are used remotely over a slow VPN connection. This change would give developers the choice to cache content data.

An alternative that may be suggested is AJAX partial loading, however this is not much more efficient and on slow connections can severely affect responsiveness. The best model IMO is one that has a long initial page load, but then only short delays for small AJAX payloads and short subsequent page loads, ie. cached data.

At the moment, the only option to achieve this in DotNet is to mod up a custom version of the middleware, which is surely far more fraught.

Proposed API

namespace Microsoft.AspNetCore.ResponseCaching;

public class ResponseCachingOptions
{
+    /// <summary>
+    /// <c>true</c> if caching is allowed for authorised endpoints
+    /// </summary>
+    public bool AllowAuthorizedEndpoint { get; set; } = false;
}

Usage Examples

In Startup.cs:

public void ConfigureServices(IServiceCollection services) {
  ...
  services.AddResponseCaching(options => { options.AllowAuthorizedEndpoint = true; });
  ...

}

Alternative Designs

None. This proposal simply exposes a new member to enable missing functionality.

Risks

There are no explicit risks. It is a non breaking change, and as such the new option defaults to existing behaviour and does not need to be specified.

In terms of implicit risk, the reason that has been given re the lack of configurability to date around authorized endpoints is that cached authorized data is a security risk. I would argue that while this is often the case, it is not always the case. The developer should be able to configure the framework so as to acheive their own desired balance of security, network performance and device CPU/resource loading.

To further protect cached data, it would also be possible to cache it in encrypted form and use javascript to decrypt before parsing to JSON, using an encryption key passed only with authorized page loads. This would sacrifice client CPU loading for network bandwidth, very likely a productive swap in terms of page load speed.

See PR to implement the full functionality: https://github.com/dotnet/aspnetcore/pull/56768

halter73 commented 1 month ago

It does seem pretty scary to enable something like this globally.

Is there a reason that you're proposing this API specifically for response caching and not output caching? Most people seem to prefer output caching because it will cache more often even if the client is a web browser that doesn't send Cache-Control headers explicitly allowing for stale content.

Output caching also allows you to specify custom policies globally or per-endpoint which allow caching even authenticated requests. @sebastienros might be able to provide more details.

6pac commented 1 month ago

@halter73 It depends on your niche - as I say, this application is used on an intranet. I think poor design - which is rife - is far more dangerous by leaking unwanted information in AJAX datasets within authenticated pages.

As to enabling it globally, (1) it's off by default, and (2) without adding cache headers specifically to an authorised endpoint, it won't be cached. I suppose the change could, however, be framed as a per-endpoint setting. If you think that would be more likely to succeed, I could look into a change to approach it that way instead.

Note that I am wanting the client (ie. web browser) to be able to cache the data from specific dedicated URLs/endpoints. From what I understand, output caching is server side, which I don't care about at all. Most of my apps have less than 100 users and run on about the third to bottom tier AWS VM costing maybe $60 a month - traffic and server load is almost negligible. These are internal facing custom database solutions, not public facing websites doing high traffic volume.

If I've gotten the caching configuration wrong, please let me know - if there is any way to do what I want in the existing framework, I'm completely happy to use it.

Perhaps I'll run down my scenario.

Clearly all the static assets in my sites are already being cached. I came to this because I wanted to start caching specific dynamic response datasets.
My total page load for some apps is typically ~12Mb and it's the same 11.5Mb of background data being loaded every time (for example a list of all 4500 clients and 400 staff in the system) - the actual page HTML and page specific data can be meaured in kb.
So this gives me two options, (1) to either move to AJAX loading or (2) to work out how to cache the dataset for a period of time (say 2 weeks).
I dislike AJAX loading for any but truly enormous datasets, simply because it kills page responsiveness, especially on slow connections (and I live and work in a remote area where occasionally we are using a satellite internet connection which is very slow - the yardstick I use for my designs is 'how well will it work over a 1 Mbps connection at 1000 ping?'). I coded up a system that will detect the differences to the cached dataset since it was first generated, so for example we cache the user dataset at a dedicated URL and then for the next two weeks the page load will contain only the user data that has changed since the time the cached dataset was read, and this is used to update the cached data. It worked flawlessly - until I noticed that the file wasn't actually being cached, despite the cache headers.

My only other option would be, I suppose, to write the data as a .js file to the site and serve it up as static content, but that feels quite hacky, and I think would have more serious security risks associated with it, not to mention the locking and contention issues.

As I have said, I think the framework needs to acknowledge that there is a line where it needs to hand the responsibility over to the developers to do things the right way. We can't block useful features just because they could be misused.

halter73 commented 1 month ago

Note that I am wanting the client (ie. web browser) to be able to cache the data from specific dedicated URLs/endpoints. From what I understand, output caching is server side, which I don't care about at all.

The response caching middleware is also server-side which is why it's potentially dangerous. Not only will it return a 304 if the client sends the right If-None-Match or If-Modified-Since headers, it will also return a 200 with the cached response body given the right Cache-Control response headers unless the client sends request headers that prevent it.

https://github.com/dotnet/aspnetcore/blob/95039134856e8af85aeccd7d404ec96ec5bb73d4/src/Middleware/ResponseCaching/src/ResponseCachingMiddleware.cs#L183-L189

Furthermore, output cache middleware will also return a 304 instead of just returning a 200 with the cached response if given the right If-None-Match or If-Modified-Since headers.

https://github.com/dotnet/aspnetcore/blob/95039134856e8af85aeccd7d404ec96ec5bb73d4/src/Middleware/OutputCaching/src/OutputCacheMiddleware.cs#L283-L295

The primary difference between ASP.NET Core response caching and output caching is that output caching allows more configurable caching rules instead of strictly basing caching decisions based on request headers.

  • Is typically not beneficial for UI apps such as Razor Pages because browsers generally set request headers that prevent caching. Output caching, which is available in ASP.NET Core 7.0 and later, benefits UI apps. With output caching, configuration decides what should be cached independently of HTTP headers.

https://learn.microsoft.com/aspnet/core/performance/caching/response?view=aspnetcore-8.0

And you can see that the "Authorization" header is specifically called out one of the non-configurable headers that response caching depends on:

  • The Authorization header must not be present.

https://learn.microsoft.com/aspnet/core/performance/caching/middleware?view=aspnetcore-8.0#cfc

There are a lot of headers that you cannot make the response caching middleware ignore. The Authorization header is just one of them. Rather than adding global flags for each and every one of these headers, we introduced the new output caching middleware which gives you far more control of what does and doesn't get cached and defaults to serving cached results even if the client sends something like Cache-Control: no-cache as a request header.

However, output caching still allows you to respect the Cache-Control: no-cache with a custom policy if you keep OutputCacheContext.AllowCacheLookup = false for that request. It's just the defaults that are different from response caching. Otherwise, output caching just the newer, more flexible version of response caching.

https://learn.microsoft.com/aspnet/core/performance/caching/output?view=aspnetcore-8.0#override-the-default-policy

6pac commented 1 month ago

Thanks for the info, it will take me a little while to digest. I'm sorry if I misunderstand the fundamentals - I realise that your time is valuable and I don't want to waste it.
Coming from the database and desktop world, my knowledge of HTTP is informal and I'm not at all familiar with the nitty gritty of caching header types and behaviours.

I didn't realise that there were server-side security issues, I thought it was all about cached data for the authorised endpoint being stored on the client.
Surely though, if the page is subject to authorisation, an unauthorised user can't access it, regardless of cache settings? I've always assumed that the authorization is ahead of caching in the pipeline, otherwise this would be a huge security issue.

Also, can we just turn off server side caching altogether? I'm just a little puzzled as to why, when all I'm after is some client side caching, I'm being dragged into a discussion of the security issues around server side caching.
Surely the framework should be able to separate these two? If it's blocking client side caching because of server side caching issues, isn't that a fundamental architectural flaw?

From what I understand, client side caching works like this: 1) Server flags that an endpoint/url is allowed to be cached (via a header not containing NO-CACHE) 2) Browser stores (caches) endpoint content 3) Browser subsequently ignores requests to the endpoint/url and substitutes the cached data, for the caching period 4) This can be overridden by a hard refresh. From a server-side perspective, if updated data is issued, a different unique ID is added to the querystring so that the URL changes and the content is updated.

The only security issue here is the storage of the data by the browser. Other than the 'NO-CACHE' header flag, the server is not even involved in the caching process.

The issue here as I understand it is that (1) ASP NET adds the NO-CACHE flag to dynamic endpoints, (2) to circumvent that, one must turn on Response Caching, (3) Response Caching cannot be used for authorised endpoints.

6pac commented 1 month ago

@halter73 I've read up on your above points and it looks to me like they are 100% referring to server side caching. I'm starting to think that this reflects a fundamental architectural problem in the framework, ie. that logic for client and server side caching cannot be separated. I think there are some good points in my above reply. Could you respond?

Particularly:

halter73 commented 1 month ago

(1) ASP NET adds the NO-CACHE flag to dynamic endpoint

There are a lot of times ASP.NET Core components like the cookie authentication handler do set the Cache-Control and Pragma to no-cache, but it's not just any dynamic endpoint. Generally, it's for a good reason like not wanting to cache a Set-Cookie header which could even be bad even for client caching if it causes a fresh cookie to get overwritten by a stale one. Or not wanting to cache an error page.

(2) to circumvent that, one must turn on Response Caching

We'd recommend output caching over response caching for most scenarios these days, but there are other options. There's nothing stopping you from removing no-cache from every response inside of a HttpContext.Response.OnStarting callback, not that we'd recommend that. It would probably be best to determine exactly which component is adding no-cache and preventing that component from running if it's really unneeded.

(3) Response Caching cannot be used for authorised endpoints.

But output caching can be used for authorized endpoints if you really want to. But it's a security risk, so you have to go out of your way to allow it with a custom IOutputCachePolicy.

there seems to be no way to enable client side caching without server side caching, when these are entirely separate functionalities and should be treated using separate APIs/middleware

This I agree more with. I'm not sure that these should be entirely separate middleware considering configuring things like per-endpoint cache expiration policies would largely be the same. I cannot see a scenario where you would want to use a server cache over a client cache if the client sent a request with an If-None-Match header indicating their cache is up to date.

I do however agree it would be nice if the output caching middleware's support for client caching didn't rely on caching the entire response body on the server. Caching response bodies shouldn't be necessary just to appropriately with 304s based on what policy is configured, what ETags were sent in previous responses, and whether any cache entries were evicted. I talked to @sebastienros and he agrees. However, as far as we know, you're the first person to request this functionality. I wonder if everyone else is just writing custom middleware for this, there isn't a big demand, or people just don't know they want this.

This diverges quite a bit from the original issue title to "Allow Response Caching for Authorized Endpoints" though. Would you be fine if we renamed this issue to "Add support for client-only caching to output caching"? If not, please feel free to file a separate issue with that request. I don't think there's going to be any interest in supporting any kind of caching for authorized endpoints by default, server or client. But as I mentioned, it's at least possible to configure output caching to ignore whether or not the current user is authenticated.

6pac commented 1 month ago

Thanks for getting back. Again, I'll have to investigate IOutputCachePolicy.

Thought I should mention, I have found a way of only doing client side only caching with ResponseCaching:

 [ResponseCache(Duration = 6000, Location = ResponseCacheLocation.Client)]

Other ResponseCacheLocation values are All (client + server) and None (caching turned off). So for example, allowing authorized endpoints might be acceptable only for endpoints with the Client location. However the dependency injection model appears to make it very difficult to bring the configuration of the endpoint and the policy together.
I'm probably getting ahead of myself here, so I'll investigate more before posting again.

Re renaming the issue - be my guest.

Note that there are a lot of people with my scenario and echoing my sentiments out there in support posts, and I haven't seen anyone with a solution yet, so if a good workaround comes out of this, I'm happy to publicise it.

6pac commented 1 week ago

@halter73 after all this, it looks like this is all I need:

using System;
using System.Text;
using System.Collections.Generic;
using Microsoft.AspNetCore.Http;
using Microsoft.AspNetCore.Mvc.Filters;
using Microsoft.AspNetCore.Mvc.Infrastructure;
using Microsoft.AspNetCore.Mvc;
using Microsoft.Net.Http.Headers;

namespace AspMvcApp {
  // https://stackoverflow.com/questions/67901155/why-is-asp-net-core-setting-cache-control-headers-on-error-responses

  public class CacheControlAttribute : ActionFilterAttribute {
    public int DurationSec { get; set; } = 0;

    public override void OnActionExecuted(ActionExecutedContext context) {
        SetCacheControlHeaders(context.HttpContext.Response);
    }

    private void SetCacheControlHeaders(HttpResponse response) {
      response.Headers[HeaderNames.CacheControl] = $"private,max-age={DurationSec}";
    }
  }
}

implemented like so:

namespace AspMvcApp.Controllers { 
  [Authorize]
  [CacheControl(DurationSec = 2629746)]  // cache for a month
  public class ScriptController : Controller {
      [Route("Script/list-view-cache.js")]
      public IActionResult list_view_cache(string hash) {
        var scriptText= ListViewCache.GetLVFlexTableCacheItemByHash(hash).FlexTableBaseDataSerialised;
        return Content(scriptText, "text/javascript");
      }
  }
}

My immediate question is: why on earth was it so hard to find this information!? It is not discussed anywhere in MS documentation as far as I can see.
The obvious follow up question: is this really all there is to it, or am I missing something?

halter73 commented 5 days ago

If all you need is to add Cache-Control: $"private,max-age={DurationSec}" to responses produced by MVC, your solution is fine. Is there a particular reason you didn't use the [ResponseCache] attribute since you're just focused on MVC? That shouldn't have the limitation around authorized endpoints like the middleware does unless I'm missing something.

6pac commented 5 days ago

Sorry, are you saying that we can use the [ResponseCache] attribute without using the middleware? Is that documented anywhere?