dotnet / aspnetcore

ASP.NET Core is a cross-platform .NET framework for building modern cloud-based web applications on Windows, Mac, or Linux.
https://asp.net
MIT License
34.59k stars 9.79k forks source link

Epic: IDistributedCache updates in .NET 9 #53255

Open adityamandaleeka opened 4 months ago

adityamandaleeka commented 4 months ago

Status: feedback eagerly sought

Tl;Dr

Problem statement

The distributed cache in asp.net (i.e. IDistributedCache) is not particularly developed; it is inconvenient to use, lacks many desirable features, and is inefficient. We would like this API to be a "no-brainer", easy to get right feature, making it desirable to use - giving better performance, and a better experience with the framework.

Typical usage is shown here; being explicit about the problems:

Inconvenient usage

The usage right now is extremely manual; you need to:

Inefficiencies

The existing API is solely based on byte[]; the demand for right-sized arrays means no pooled buffers can be used. This broadly works for in-process memory-based caches, since the same byte[] can be returned repeatedly (although this implicitly assumes the code doesn't mutate the data in the byte[]), but for out-of-process caches this is extremely inefficient, requiring constant allocation.

Missing features

The existing API is extremely limited; the concrete and implementation-specific IDistributedCache implementation is handed directly to callers, which means there is no shared code reuse to help provide these features in a central way. In particular, there is no mechanism for helping with "stampede" scenarios - i.e. multiple concurrent requests for the same non-cached value, causing concurrent backend load for the same data, whether due to a cold-start empty cache, or key invalidation. There are multiple best-practice approaches that can mitigate this scenario, which we do not currently employ.

Likewise, we currently assume an in-process or out-of-process cache implementation, but caching almost always benefits from multi-tier storage, with a limited in-process (L1) cache supplemented by a separate (usually larger) out-of-process (L2) cache; this gives the "best of both" world, where the majority of fetches are served efficiently from L1, but cold-start and less-frequently-accessed data still doesn't hammer the underlying backend, thanks to L2. Multi-tier caching can sometimes additionally exploit cache-invalidation support from the L2 implementation, to provide prompt L1 invalidation as required.

This epic proposes changes to fill these gaps

Current code layout

At the moment the code is split over multiple components, in the main runtime, asp.net, and external packages (only key APIs shown):

This list is not exhaustive - other 3rd-party and private implementations of IDistributedCache exist, and we should avoid breaking the world.

Proposal

The key proposal here is to add a new caching abstraction that is more focused, HybridCache, in Microsoft.Extensions.Caching.Abstractions; this API is designed to act more as a read-through cache, building on top[ of the existing IDistributedCache implementation, providing all the implementation details required for a rich experience. Additionally, while simple defaults are provided for the serializer, it is an explicit aim to make such concerns fully configurable, allowing for json, protobuf, xml, etc serialization as appropriate to the consumer.

namespace Microsoft.Extensions.Caching.Distributed;

public abstract class HybridCache // default concrete impl provided by service registration
{
    protected HybridCache() { }

    // read-thru usage
    public abstract ValueTask<T> GetOrCreateAsync<TState, T>(string key, TState state, Func<TState, CancellationToken, ValueTask<T>> callback, HybridCacheEntryOptions? options = null, ReadOnlyMemory<string> tags = default, CancellationToken cancellationToken = default);
    public virtual ValueTask<T> GetOrCreateAsync<T>(string key, Func<CancellationToken, ValueTask<T>> callback,
    HybridCacheEntryOptions? options = null, ReadOnlyMemory<string> tags = default, CancellationToken cancellationToken = default)
    { /* shared default implementation uses TState/T impl */ }

    // manual usage
    public abstract ValueTask<(bool Exists, T Value)> GetAsync<T>(string key, HybridCacheEntryOptions? options = null, CancellationToken cancellationToken = default);
    public abstract ValueTask SetAsync<T>(string key, T value, HybridCacheEntryOptions? options = null, ReadOnlyMemory<string> tags = default, CancellationToken cancellationToken = default);

    // key invalidation
    public abstract ValueTask RemoveKeyAsync(string key, CancellationToken cancellationToken = default);
    public virtual ValueTask RemoveKeysAsync(ReadOnlyMemory<string> keys, CancellationToken cancellationToken = default)
    { /* shared default implementation uses RemoveKeyAsync */ }

    // tag invalidation
    public virtual ValueTask RemoveTagAsync(string tag, CancellationToken cancellationToken = default)
    { /* shared default implementation uses RemoveTagsAsync */ }
    public virtual ValueTask RemoveTagsAsync(ReadOnlyMemory<string> tags, CancellationToken cancellationToken = default) => default;
}

Notes:

Usage of this API is then via a read-through approach using lambda; the simplest (but slightly less efficient) approach would be simply:

// HybridCache injected via DI
var data = await cache.GetOrCreateAsync(key, _ => /* some backend read */, [expiration etc], [cancellation]);

In this simple usage, it is anticipated that "captured variables" etc are used to convey the additional state required, as is common for lambda scenarios. A second "stateful" API is provided for more advanced scenarios where the caller wishes to trade convenience for efficiency; this usage is slightly more verbose but will be immediately familiar to the users who would want this feature:

// HybridCache injected via DI
var data = await cache.GetOrCreateAsync(key, (some state here), static (state, _) => /* some backend read */, [expiration etc], [cancellation]);

This has been prototyped and works successfully with type inference etc.

The implementation (see later) deals with all the backend fetch, testing, serialization etc aspects internally.

(in both examples, the "discard" (_) is conveying the CancellationToken for the backend read, and can be used by providing a receiving lambda parameter)

An internal implementation of this API would be registered and injected via a new AddHybridCache API (Microsoft.Extensions.Caching.Abstractions):

namespace Microsoft.Extensions.Caching.Distributed;

public static class HybridCacheServiceExtensions
{
    public static IServiceCollection AddHybridCache(this IServiceCollection services, Action<HybridCacheOptions> setupAction)
    {...}

    public static IServiceCollection AddHybridCache(this IServiceCollection services)
    {...}
}

The internal implementation behind this would receive IDistributedCache for the backend, as it exists currently; this means that the new implementation can use all existing distributed cache backends. By default, AddDistributedMemoryCache is also assumed and applied automatically, but it is intended that this API be effective with arbitrary IDistributedCache backends such as redis, SQL Server, etc. However, to address the issue of byte[] inefficiency, a new entirely optional API is provided and tested for; if the new backend is detected, lower-allocation usage is possible. This follows the pattern used for output-cache in net8:

namespace Microsoft.Extensions.Caching.Distributed;

public interface IBufferDistributedCache : IDistributedCache
{
    ValueTask<CacheGetResult> GetAsync(string key, IBufferWriter<byte> destination, CancellationToken cancellationToken);
    ValueTask SetAsync(string key, ReadOnlySequence<byte> value, DistributedCacheEntryOptions options, CancellationToken cancellationToken);
}

public readonly struct CacheGetResult
{
    public CacheGetResult(bool exists);
    public CacheGetResult(DateTime expiry);

    public CacheGetResult(TimeSpan expiry);

    public bool Exists { get; }
    public TimeSpan? ExpiryRelative { get; }
    public DateTime? ExpiryAbsolute { get; }
}

(the intent of the usual members here is to convey expiration in the most appropriate way for the backend, relative vs absolute, although only one can be specified; the internals are an implementation detail, likely to use overlapped 8-bytes for the DateTime/TimeSpan, with a discriminator)

In the event that the backend cache implementation does not yet implement this API, the byte[] API is used instead, which is exactly the status-quo, so: no harm. The purpose of CacheGetResult is to allow the backend to convey backend expiration information, relevant for L1+L2 scenarios (design note: async precludes out TimeSpan?; tuple-type result would be simpler, but is hard to tweak later). The expiry is entirely optional and some backends may not be able to convey it, and we need to handle it lacking when IBufferDistributedCache is not supported - in either event, the inbound expiration relative to now will be assumed for L1 - not ideal, but the best we have.

Serialization

For serialization, a new API is proposed, designed to be trivially implemented by most serializers - again, preferring modern buffer APIs:

namespace Microsoft.Extensions.Caching.Distributed;

public interface IHybridCacheSerializer<T>
{
    T Deserialize(ReadOnlySequence<byte> source);
    void Serialize(T value, IBufferWriter<byte> target);
}

Inbuilt handlers would be provided for string and byte[] (and possibly BinaryData if references allow); an extensible serialization configuration API supports other types - by default, an inbuilt object serializer using System.Text.Json would be assumed, but it is intended that alternative serializers can be provided globally or per-type. This is likely to be for more efficient bandwidth scenarios, such as protobuf (Google.Protobuf or protobuf-net) etc, but could also be to help match pre-existing serialization choices. While manually registering a specific IHybridCacheSerializer<Foo> should work, it is also intended to generalize the problem of serializer selection, via an ordered set of serializer factories, specifically by registering some number of:

namespace Microsoft.Extensions.Caching.Distributed;

public interface IHybridCacheSerializerFactory
{
    bool TryCreateSerializer<T>([NotNullWhen(true)] out IHybridCacheSerializer<T>? serializer);
}

By default, we will register a specific serializer for string, and a single factory that uses System.Text.Json, however external library implementations are possible, for example:

namespace Microsoft.Extensions.Caching.Distributed;

[SuppressMessage("ApiDesign", "RS0016:Add public types and members to the declared API", Justification = "demo code only")]
public static class ProtobufDistributedCacheServiceExtensions
{
    public static IServiceCollection AddHybridCacheSerializerProtobufNet(this IServiceCollection services)
    {
        ArgumentNullException.ThrowIfNull(services);
        services.AddSingleton<IHybridCacheSerializerFactory, ProtobufNetSerializerFactory>();
        return services;
    }

    private sealed class ProtobufNetSerializerFactory : IHybridCacheSerializerFactory
    {
        public bool TryCreateSerializer<T>([NotNullWhen(true)] out IHybridCacheSerializer<T>? serializer)
        {
            // in real implementation, would use library rules
            if (Attribute.IsDefined(typeof(T), typeof(DataContractAttribute)))
            {
                serializer = new ProtobufNetSerializer<T>();
                return true;
            }
            serializer = null;
            return false;
        }
    }
    internal sealed class ProtobufNetSerializer<T> : IHybridCacheSerializer<T>
    {
        // in real implementation, would use library serializer
        public T Deserialize(ReadOnlySequence<byte> source) => throw new NotImplementedException();

        public void Serialize(T value, IBufferWriter<byte> target) => throw new NotImplementedException();
    }
}

The internal implementation of HybridCache would lookup T as needed, caching locally to prevent constantly using the factory API.

Additional functionality

The internal implementation of HybridCache should also:

Note that it is this additional state for stampede and L1/L2 scenarios (and the serializer choice, etc) that makes it impractical to provide this feature simply as extension methods on the existing IDistributedCache.

The new invalidation API is anticipated to be something like:

namespace Microsoft.Extensions.Caching.Distributed;

public interface IDistributedCacheInvalidation : IDistributedCache
{
    event Func<string, ValueTask> CacheKeyInvalidated;
}

(the exact shape of this API is still under discussion)

When this is detected, the event would be subscribed to perform L1 cache invalidation from the backend.

Additional things to be explored for HybridCacheOptions:

Additional modules to be enhanced

To validate the feature set, and to provide the richest experience:

Open issues

aKzenT commented 4 months ago

I know it's difficult to change the existing interface, but depending on the other changes planned in this epic, it would be great if we could have distributed cachr methods that were more performance oriented, working with Span instead of byte[].

A common scenario for using the cache is saving serialized objects or large texts encoded as UTF-8. Requiring byte[] usually means copying this data at least once.

Similar issues exist when reading from the cache which also returns a byte[] and thus does not allow for using rented buffers or similar optimizations.

As the cache is often used several times for each request in any non trivial web application (e.g. session store, query cache, response cache), optimizations here would really pay off.

mgravell commented 4 months ago

Yup. Totally understand, @aKzenT , and that's part of the prototype. Will update with more details of the proposed API as it evolves, but the short version here is:

mgravell commented 4 months ago

@aKzenT please see updated body

Tornhoof commented 4 months ago

As for L1+L2 caching, you might want to talk to developers of MS FASTER https://github.com/microsoft/FASTER, which has L1+L2 support, while L2 is not strictly out of process, more like out of main memory (disk based or azure based if I remember correctly). As from my own experience with MS FASTER, it is not necessarily easy to configure properly, but covers a lot of the functionality for L1/L2.

mgravell commented 4 months ago

@Tornhoof aye, FASTER has come up more than a few times; because of the setup etc required, I doubt we can reasonably make that a default implementation; the most pragmatic solution there might be to provide an IDistributedCache (i.e. L2) implementation that is FASTER-based, and leave L1+L2 disabled; that would 100% be something I'd love to see done, but it doesn't need to be critical-path for this epic

aKzenT commented 4 months ago

@mgravell thank you a lot for the update. I'm seeing a lot of things addressed that I've missed in the past, working on multi-server web apps.

For L1+L2 Caching we have been quite happy with https://github.com/ZiggyCreatures/FusionCache in the past which is also built upon IDistributedCache. I remember that there were some features missing from IDistributedCache that made it hard to implement advanced scenarios in that library. So I would like to invite @jodydonetti to this discussion as well as he can probably best comment on these issues.

One thing I remember that was missing was being able to modify the cache entry options (i.e. life time) of a cache entry without going through a Get/Set cycle. Being able to modify the lifetime allows for some advanced scenarios like invalidating a cache entry from the client (e.g. you received a webhook notifying you about changes in data) or reducing the time to allow things like stale results.

Another thing related to cache invalidation, that is not really possible with the current API in an efficient way, is the removal/invalidation of a group of related cache entries. Let's say you have a cache for pages of a CMS system with each page being one entry. The CMS informs you about changes via a web hook, which invalidates the cache for all pages. Directly refreshing all pages might be expensive, so you would rather refresh them individually on demand. So you want to invalidate all page entries in the cache, but there is no way to get the list of entries from the cache, nor is there a good way to delete the entries. Our solution was to built this functionality ourself using a Redis Set that manages the related keys and then iterating through these keys and removing them one by one. But it felt very hacky as you cannot even use the same Redis Connection that the distributed cache uses, as far as I remember.

mgravell commented 4 months ago

@aKzenT re individual cache invalidation: there is a remove API that is meant to serve that function, but it doesn't allow modify of the options; I'd love to understand the need there further

re group cache invalidations: that sounds a lot like the "tag" feature of output-cache, i.e. you associate entries with zero, one or more tags, and then you can invalidate an entire tag, which nukes everything associated; the problem is: that tracking still needs to go somewhere, and it isn't necessarily an inbuilt feature of the cache backend - it was a PITA to implement reasonably on redis without breaking the cluster distribution, for example (ask me how I know!). It also isn't something that can fit in the existing IDistributedCache backend without significant work. Maybe there is some capacity there if we simplified to "zero or one" tags, but even then... I'm not sure that is something we can tackle in this epic, but I'm open to being wrong there!

mgravell commented 4 months ago

Re FusionCache: that isn't one I've seen before, but glancing at the homepage, I see that the fundamental design is pretty similar (some differences, but: very comparable). There is a common theme in these things - indeed, a lot of inspiration (not actual code) in the proposal draws on another implementation of the same that we had when I was at Stack Overflow (in that version, we also had some Roslyn analyzers which complained about inappropriate captured / ambient usage - very neat!). My point: lots of approaches converging around the same API shape.

aKzenT commented 4 months ago

@mgravell we had the same experience implementing our invalidation logic for a redis cluster setup. It's really hard to get right. I would not expect the design to provide a complete solution to this issue, but maybe there is some way that would make it possible for other libraries to support that while building on top of IDistributedCache. Maybe @jodydonetti has an idea how that could work.

As for modifying the options, in the case of FusionCache there is the possibility to allow stale entries, which are treated as expired, but still available in case the backend is not available. For these cases there is a separate timeout of how long you want to allow a result being stale. The TTL that is sent to the underlying IDistributedCache is then the initial TTL plus the additional stale timeout. So when you invalidate an entry, but still want to allow stale results, you cannot simply delete the entry. Instead you would want to update the timeout to being equal to the stale timeout. Hope that makes sense.

mgravell commented 4 months ago

Yep, very familiar with the idea - it is conceptually related to the "eager pre-fetch" mentioned above - with two different TTLs with slightly different meanings

mgravell commented 4 months ago

via twitter:

Yes, it does lack proper API there is too much redundant code we are required to write or maintain a library on our end I used the DistributedCache code snippet provided in one of the .NET blog post by you and it is definitely going to be nice to have these features.

This is referring to this repo, which offered some of the highlights of this proposal as extension methods, but without any of the "meat" that drives the additional functionality.

aKzenT commented 4 months ago

One thing I'm wondering is, if the choice to put the generic type parameter on the interface rather than the methods might be limitting in some cases and would require some classes to have to configure and inject multiple IDistributedCache instances. I'm not sure if that is really a problem, but it would be nice to learn, why you went for that choice, which differs from other implementations that I have seen.

mgravell commented 4 months ago

@aKzenT fair question, and it isn't set in stone, but - reasons:

  1. to allow the serializer to be injected, although that might also be possible by taking IServiceProvider
  2. to allow the callback signature to be inferred in both the stateless and stateful case, although this might also be possible with a <,> method

Tell you what, I'll branch my branch and try it the other way. Although then I need a new name... dammit!

mgravell commented 4 months ago

@aKzenT ^^^ isn't terrible; will add notes above - we can probably go that way; I probably hadn't accounted for the improvements in generic handling in the last few C# versions (especially with delegates); in particular, I didn't have to change the usage code at all - see L107 for entirety of the usage update

I will benchmark the perf of GetService() as it applies here, but to be honest: in any reasonable L1 cache-hit scenario, I would expect the overall performance to be dominated by the deserialize; and in any L2 or backend scenario, we should anticipate the GetService() to be a rounding error compared to the L2/backend work (if it isn't: why are we caching?)

danielmarbach commented 4 months ago

Would it be a option to use change token for the cache key invalidation instead of the event type proposed currently?

mgravell commented 4 months ago

@danielmarbach in reality, I'm not sure that is feasible in this scenario; the example uses shown for that API seem pretty low-volume and low issuance frequency; file config changes etc, but if we start talking about cache: we're going to need as many change tokens as we have cached entries, and crucially: the backend layer would need to know about them; I'm mentally comparing that to how redis change notifications can work, and to do that efficiently: we don't want to store anything extra, especially at all the backend layers. Given that all we actually want/need here is the string, this seems like going a very long way out of shape, to reuse an API for the sake of it. Happy to keep considering alternatives if I'm missing something, though! Please keep it coming, that's the entire point of this!

aKzenT commented 4 months ago

@danielmarbach I believe it should be fairly easy to implement a GetChangeToken(string key) method as an extension method that subscribes to the event and listens to changes to the specific key. The other way arround is harder. That being said, I'm a fan of ChangeTokens and IIRC, MemoryCache uses change tokens to invalidate entries, so there might be some value to provide such an extension method directly in the framework in addition to the event @mgravell .

normj commented 4 months ago

To add to your list of "Current Code Layout" we have an implementation at AWS for DynamoDB as backend. https://github.com/awslabs/aws-dotnet-distributed-cache-provider

aKzenT commented 4 months ago

@aKzenT ^^^ isn't terrible; will add notes above - we can probably go that way; I probably hadn't accounted for the improvements in generic handling in the last few C# versions (especially with delegates); in particular, I didn't have to change the usage code at all - see L107 for entirety of the usage update

I will benchmark the perf of GetService() as it applies here, but to be honest: in any reasonable L1 cache-hit scenario, I would expect the overall performance to be dominated by the deserialize; and in any L2 or backend scenario, we should anticipate the GetService() to be a rounding error compared to the L2/backend work (if it isn't: why are we caching?)

If you want to go that route, is there a particular reason why you want ICacheSerializer to be generic instead of just its methods being generic? I feel like for the basic scenario of System.Text.Json, this can just be a singleton. If someone needs to differentiate serialization by type it should be easy to do using a composite style pattern.

Instead of configuring caching and serialization by type, I would rather have the ability to have named instances that I can configure in addition to a default instance, similar to how HttpClientFactory works. FusionCache allows this by the way, if you are interested in an example.

aKzenT commented 4 months ago

Instead of configuring caching and serialization by type, I would rather have the ability to have named instances that I can configure in addition to a default instance, similar to how HttpClientFactory works. FusionCache allows this by the way, if you are interested in an example.

Keyed Services might make that easy. Although I'm not sure if there is an easy way to register both a keyed cache and a keyed serializer and tell the DI system that it should use the serializer with the same key when injecting into the cache.

niemyjski commented 4 months ago

I never liked the name of it to start with because I might want an in memory, hybrid or distributed cache and the abstraction doesn't change. I also don't like how expiration / expiration strategy is not present (possibly it is with extension methods, I haven't looked in a long time). I also don't feel very strongly about adding generics to the interface because a cache can store all different kinds of shapes. We probably should slim down our default interface, but this is what we've found useful over the past decade: https://github.com/FoundatioFx/Foundatio#caching

mgravell commented 4 months ago

To add to your list of "Current Code Layout" we have an implementation at AWS for DynamoDB as backend. https://github.com/awslabs/aws-dotnet-distributed-cache-provider

Great to know, thanks! The list was never meant to be exhaustive - I'll add and clarify. The key takeaway is: we want to actively avoid breaking any implementations; the new functionality should happily work against the existing backend, with them using the additional / optional extensions if appropriate.

unaizorrilla commented 4 months ago

Hi

For serialization using System.Text.Json, will be a good idea to support a JsonSerializerContext. The user can set the JsonSerializerContext to be used by ICacheSerializer on some way?

mgravell commented 4 months ago

@unaizorrilla I'm working on serialization config currently; will keep that in mind - a good outcome might be "if you do nothing, you get context-free json; if you explicit specify the serializer, that option is available to you"

mgravell commented 4 months ago

from twitter

Don't you think it would be better to be able to set the expiration policy(Abs expiry/Sliding expiry) globally?

A default options object isn't unreasonable - AbsoluteExpiration obviously no-go, but AbsoluteExpirationRelativeToNow plus sliding: are usable - maybe something for TypedDistributedCacheOptions (or whatever that name becomes)

amoerie commented 4 months ago

@aKzenT re individual cache invalidation: there is a remove API that is meant to serve that function, but it doesn't allow modify of the options; I'd love to understand the need there further

re group cache invalidations: that sounds a lot like the "tag" feature of output-cache, i.e. you associate entries with zero, one or more tags, and then you can invalidate an entire tag, which nukes everything associated; the problem is: that tracking still needs to go somewhere, and it isn't necessarily an inbuilt feature of the cache backend - it was a PITA to implement reasonably on redis without breaking the cluster distribution, for example (ask me how I know!). It also isn't something that can fit in the existing IDistributedCache backend without significant work. Maybe there is some capacity there if we simplified to "zero or one" tags, but even then... I'm not sure that is something we can tackle in this epic, but I'm open to being wrong there!

I just wanted to add my vote for grouped cache invalidation. We have a similar manually implemented tag based bookkeeping system where each cache value is associated with a tag, and it is possible to expire all cached values for a tag. We aggressively cache some rarely updated SQL tables, but in the event they are updated, we need to evict all cached data of that table, regardless of which specific query fetched its data.

P.S. I could definitely live with the restriction that one cached value can only be associated with one tag.

Misiu commented 3 months ago

We also have a custom caching mechanism based on Redis and https://github.com/thepirat000/CachingFramework.Redis. We assign multiple tags to each entity or response using https://github.com/thepirat000/CachingFramework.Redis?tab=readme-ov-file#tagging-mechanism. It would be hard to live with only a signed tag per cached value because sometimes business scenarios require clearing cache per user (all cached values of a specific user) or per table.

mgravell commented 3 months ago

We also have a custom caching mechanism based on Redis...

Glancing at that, it sounds very similar to the multi-tag aspect of output-cache, so: I'm very familiar. The awkward bit (as you no doubt know) then becomes GC when the target keys expire, which we dealt with in output cache using sorted sets for the tag membership (so we can just ZRANGE, and we have the prior implementation to refer to). So the "how" is fine - the trickier bit becomes hooking that into the backend, since that wasn't part of IDistributedCache. But if we want to fix that: now is probably our best chance, which we could do as part of the second auxiliary backend API. I can't promise we'll add this right now, but I'm adding it to my consideration list.

onionhammer commented 3 months ago

Why not call GetAsync with a callback GetOrSetAsync like IMemoryCache?

One thing would be cool would be to take a page out of react-query's book; Pass an array as the key, and then use the first few items as a 'prefix' for bulk cache eviction, for example:

// get or set from the cache, fallback to database
await myCache.GetOrSetAsync([ "partner", partnerId, "users" ], () => db.GetUsersAsync(), expires: TimeSpan.FromMinutes(5));

// evict partner and everything underneath when, for example partner is deleted, i.e.
// - ["partner", "1234"]
// - ["partner", "1234", "users"]
// - ["partner", "1234", "preferences"]
await myCache.RemoveAsync(["partner", partnerId]);
mgravell commented 3 months ago

I haven't updated this page, but the memory-cache naming has already come up - indeed that is the plan.

Re bulk evictions: that's essentially the "tags" mentioned previously; the question here then becomes: can we introduce that metaphor cleanly since the existing backends don't have that capability? I'm still working on that question.

On Fri, 9 Feb 2024 at 21:20, Erik O'Leary @.***> wrote:

Why not call GetAsync with a callback GetOrSetAsync like IMemoryCache?

ā€” Reply to this email directly, view it on GitHub https://github.com/dotnet/aspnetcore/issues/53255#issuecomment-1936618721 or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAEHMF6YJWLJBRV2GG2YFLYS2HJZBFKMF2HI4TJMJ2XIZLTSWBKK5TBNR2WLJDUOJ2WLJDOMFWWLO3UNBZGKYLEL5YGC4TUNFRWS4DBNZ2F6YLDORUXM2LUPGBKK5TBNR2WLJDUOJ2WLJDOMFWWLLTXMF2GG2C7MFRXI2LWNF2HTAVFOZQWY5LFUVUXG43VMWSG4YLNMWVXI2DSMVQWIX3UPFYGLAVFOZQWY5LFVIYTEOBUGE3TKMBTGSSG4YLNMWUWQYLTL5WGCYTFNSBKK5TBNR2WLKRVGU3TENBTGEZTAM5ENZQW2ZNJNBQXGX3MMFRGK3FMON2WE2TFMN2F65DZOBS2YSLTON2WKQ3PNVWWK3TUUZ2G64DJMNZZJAVEOR4XAZNKOJSXA33TNF2G64TZUV3GC3DVMWUDCNZWGIYDGNBXQKSHI6LQMWSWS43TOVS2K5TBNR2WLKRSGA3TGMRXGE4TMMECUR2HS4DFUVWGCYTFNSSXMYLMOVS2UMJSHA2DCNZVGAZTJAVEOR4XAZNFNRQWEZLMUV3GC3DVMWVDKNJXGI2DGMJTGAZ2O5DSNFTWOZLSUZRXEZLBORSQ . You are receiving this email because you commented on the thread.

Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub .

-- Regards,

Marc

glen-84 commented 3 months ago

Regarding bulk eviction/invalidation:

mgravell commented 3 months ago

Invalidation by key prefix is simply not a practical design in a system with backends that are not required to support that kind of lookup - and even on those that do: this could be an O(N) operation.

In particular:

On Sun, 11 Feb 2024, 15:18 Glen, @.***> wrote:

Regarding bulk eviction/invalidation:

  • Since tagging can be complicated, and some backends may not support it, could it be implemented as an additional interface?

    RedisCache : IDistributedCache, ICacheWithTaggableItems // naming is hard

  • In addition to tagging (or alternatively), invalidation by key prefix is also useful.

    cache.RemoveByKeyPrefix("Some.Thing.");

ā€” Reply to this email directly, view it on GitHub https://github.com/dotnet/aspnetcore/issues/53255#issuecomment-1937783173 or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAEHMBFY5RLJA7VKUQLAQTYTDONBBFKMF2HI4TJMJ2XIZLTSWBKK5TBNR2WLJDUOJ2WLJDOMFWWLO3UNBZGKYLEL5YGC4TUNFRWS4DBNZ2F6YLDORUXM2LUPGBKK5TBNR2WLJDUOJ2WLJDOMFWWLLTXMF2GG2C7MFRXI2LWNF2HTAVFOZQWY5LFUVUXG43VMWSG4YLNMWVXI2DSMVQWIX3UPFYGLAVFOZQWY5LFVIYTEOBUGE3TKMBTGSSG4YLNMWUWQYLTL5WGCYTFNSBKK5TBNR2WLKRVGU3TENBTGEZTAM5ENZQW2ZNJNBQXGX3MMFRGK3FMON2WE2TFMN2F65DZOBS2YSLTON2WKQ3PNVWWK3TUUZ2G64DJMNZZJAVEOR4XAZNKOJSXA33TNF2G64TZUV3GC3DVMWUDCNZWGIYDGNBXQKSHI6LQMWSWS43TOVS2K5TBNR2WLKRSGA3TGMRXGE4TMMECUR2HS4DFUVWGCYTFNSSXMYLMOVS2UMJSHA2DCNZVGAZTJAVEOR4XAZNFNRQWEZLMUV3GC3DVMWVDKNJXGI2DGMJTGAZ2O5DSNFTWOZLSUZRXEZLBORSQ . You are receiving this email because you commented on the thread.

Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub .

glen-84 commented 3 months ago

Again, could this be something that each cache implementation "advertises" via an interface? It seems unfortunate to throw out useful features unless every implementation supports it.

aKzenT commented 3 months ago

@glen-84 if the final implementation supports multiple tags for invalidation, I can't think of a scenario where you would want to remove by prefix and that would not work equally well with tags. Or do you have an example?

glen-84 commented 3 months ago

@aKzenT

(or alternatively)

It could be an alternative, if support for tagging proves to be more difficult to implement.

NateMerritt commented 3 months ago

There are some scenarios that I've found valuable to support in an IDistributedCacheService over the past 10 years which I don't see mentioned in the proposed API changes above:

  1. Bulk support for getting or setting more than one key/value pair in a single network operation (instead of RBAR). Yes, there may be performance reasons (depending upon the implementation) to not get/put too many cache items at once but it is still useful for a low number of keys. Methods we've supported: GetItems(), PutItems(), RemoveItems(), and GetOrCreateItems().
  2. Atomic (single network call) increment/decrement support. This very useful for rate limiting implementations and other scenarios where multiple network operations would cause race conditions.
  3. A PutItemMode flag when writing keys into the cache: public enum PutItemMode { Set, SetIfNotExists, SetIfExists }.
  4. The ability to get the TTL of a cache item by key (only returns the TTL; doesn't return the value).

We also support the ability to inject a serializer/deserializer for use by the DistributedCacheService and have written several implementation options (Newtonsoft.Json, protobut-net, gzip, etc.). One feature I implemented in this serializer is the ability for the serializer (in conjunction with the cache implementation) to distinguish between the following states which allows for better optimization of any follow-up DB calls to get cache values from the source:

mgravell commented 3 months ago

Hi @NateMerritt

Very welcome input! Thoughts:

  1. interesting, but would require quite a nuanced implementation in the byte[]-free implementation; PutItems<T>() (for single T at a time) and RemoveItems() are more possible - presumably we could have enhanced APIs for the new optional backend, and just loop for the legacy backend; let me have a thinkl
  2. I totally agree, but I simply don't see a way of pushing that into this API into a generalized way - the legacy backend simply doesn't allow it; IMO this is best considered as a separate API and backend; there's another ticket about that somewhere... I'll see if I can find it
  3. sounds a lot like When in SE.Redis, so: I am familiar with the concept; I was hoping to reuse the existing DistributedCacheEntryOptions for the options API; I wonder if we could squeeze such a thing in there, but again: it could only apply on the new optional backend
  4. that is a useful idea; it would again be limited by backends, though, which is a bit of a problem; I'll see what can be done, but it does raise the issue that for L1+L2 we should ideally also respect the L2 cache expiration - GET+TTL in redis terms - might need a return tweak, and we'd need to account for the fact that it isn't always going to be available, even if TTL is a thing on the backend

serializer injection: yep, fully covered - see edits above, but dual API; I've used protobuf-net as the pseudo-example; I would advocate for not storing nulls, however - leaving the payload "pure" (rather than a prefix/sentinel), which allows for better pre-existing data/upgrade scenarios

jodydonetti commented 3 months ago

Hi all, and thanks @aKzenT for the mention of FusionCache, I'm happy it is serving you well.

Sorry for being late in answering, but lately I've been quite busy with finishing the last touches on the preview release of v1.0 and also preparing for today's episode of On .NET where they've been kind enough to invite me to talk about caching: if you think it can be useful, you can take a look at it for some of my views on distributed/multi-level caching that are relevant for this discussion, and why I think some approaches with distributed cache might be better than others (IMHO of course).

I promise that in the next few days I'll post a full answer with what I think about each point emerged here, both in the original proposal and in some comments, I hope it can be useful to the discussion.

Thanks again to the team for sharing the design early!

ps: hi @mgravell , FusionCache exists also because of the work you and Nick (and others I'm sure!) did at StackOverflow, so thanks again and I like that it's all kinda a cycle of inspirations, one onto the others maybe. Also, I really hope to see you at this year's MVP Summit, it would be great to have a chat in person.

mgravell commented 3 months ago

I'll watch that video in the morning, thanks. All input is welcome and invited, and I'm particularly aware of the "Microsoft vs third-party" problem/politics, so I hope you understand that this is meant to increase choice and the out-of-box experience, not to limit options. On that note: if we're talking about adding optional supplemental APIs for IDistributedCache, you're probably very well placed to opine on any missing pieces that make things hard for your offering at the moment; I can't make guarantees, but: you can see above what extensions I have in mind - if you're sat there thinking "ooh, if only we had {X}, I could {Y}" : speak now!

I don't have any plans for MVP summit - it is a lot of travel, and I have a lot of complex commitments at home. But if you ever want to grab an online chat to talk about cache or anything else: I'll find a mutually convenient time.

mgravell commented 3 months ago

Video was good; looks like we're in a very similar area here; suggestion for @jodydonetti - IMO it feels like FusionCache would be a perfectly valid alternative implementation of IReadThroughCache, to give people more options and levels of maturity, contrasting with the "we want something in-box" demands. If this is of interest, I'd be happy to try contributing a PR to add this to FusionCache (although we both know this wouldn't be a challenge for you to add, either)

jamo-semler commented 3 months ago

This feature is a much needed (and appreciated) addition to the cache story in general in ASP.NET - very good initiative šŸ™

Are invalidation across L1 and L2 (across processes) in scope also?

We have a lot of cases where we scale a API or frontend across multiple instances which use in-memory cache as L1 and ex. Redis as L2. L1 is therefore only shared in-side the process/instance, but L2 is shared across all processes/instances.

We have a lot of scenarios where we actively invalidate a cache based on a event or action in the application. But find it hard to invalidate both L1 (across process/instances) and L2 in a clean way.

Right now we use the pub/sub feature of Redis to facilitate the backplane-like infrastructure to coordinate the inter-process communication.

It would be nice if this could be baked in ASP.NET in some way or the other (maybe with different implementations like SignalR, Redis, Azure Pub/Sub ect.)

mgravell commented 3 months ago

Yes, cross-process cache invalidation is a key part of this work; that's what IDistributedCacheInvalidation is intended to achieve, providing a backend-neutral simple API for invalidations. Any backend providing this API would have its own way of coordinating this, but since you mention redis: separate to this piece, I am looking at all of

This list is in my preference order, but ideally I'd like to be able to support all 3

jodydonetti commented 3 months ago

Hi all, I've finally had time to read through the proposal and all the comments, and wow a lot of food for thoughts šŸ˜…

Sorry for this being quite long, but I hope to give you the condensed version of a lot of experiences made in a lot of years, so it'll take some space.

Defining Expectations

In the issue for the new memory cache I've added some comments, in particular in this one I highlighted something seemngly small but (imho) important about clearly deifning expectations when presenting a new caching solution:

The most important thing I'd like to highlight is that when this will be finalized and released (I suppose in the .NET 9 timeframe), it will be important to clearly explain the positioning of it. As already discussed on the https://github.com/dotnet/runtime/issues/48567 caching is a tough beast with quite different use cases, the 2 primary ones being:

low-level caching: basically a dictionary + some form of expiration, where the data is typically homogeneous (of the same type), typically privately owned by a single piece of code (eg: internally by libraries like EF), where the data is not shared app-wide, the type of the cache key should be anything (so no boxing, etc) and with a limited feature set. Also typically there will be a lot of different instances of it app-level caching: is still caching, but more "high level", where the data may be heterogeneous (of different types), the data is (or may be) shared between different pieces of code, the type of the key is typically string to allow for some higher level features like prefix/suffix addition for automatic scoping/less collisions and with a generally richer feature set, with typically few instances of it Following this distinction I see this new RCache as part of the 1st category, and in this sense it feels good to me: the scope is quite narrow, no extra high-level features and the attention to micro-benchmarking is quite high. FWIW, good šŸ‘

Regarding this issue: I would say new stuff for IDistributedCache would fall into the first category (low-level) except for the strongly type <T> thing, whereas the new IReadThroughCache interface and friends would fall into the second one (high-level).

This distinction served me well along the years, and I would suggest to think about it (and maybe correct it if you think it's wrong!) for how to "split" features and expectations.

New IReadThroughCache interface

Honestly? I like the idea of a new interface that does not inherit from the existing IDistributedCache and tries to extend it "maskerading" the fact that it can be also multi-level, but is instead a separate interface that underneath will use IDistributedCache. It's kinda the same approach I took with IFusionCache, so I've been able to let users the use of all the existing implementations of it (Redis, Memcached, etc).

As a note, maybe interesting: I initially played with the idea of creating "just" a new impl of IDistributedCache and have it be a seamless "impl swap" with some "magic" underneath, but... no, really not the best idea so this new abstraction LGTM!

Naming is hard

Since this is designed to also be potentially multi-level, I would suggest not to stick the name to the fact that is read-through (which is just a detail of how the GetOrCreate method), just specify that in the docs/xml comments.

Honestly: my idea for FusionCache was to be a sort of "test bench" for a generic multi-level cache, and if it worked well I would've liked to propose a new caching abstraction to MS (basically this one we are talking about, so here we are finally šŸ˜…) named IHybridCache which is of the name usually used for this. I see you've played with IAdvancedCache too, so kinda in the same spirit. Another can be IMultiLevelCache. Personally I like the hybrid one more, but it's just a personal taste.

Just my 2 cents.

Sliding Expiration

One thing though about the api of IReadThroughCache: I'm personally not for the Refresh method, since that is related to sliding expiration and in a multi-level cache that is something I've found quite confusing for people and difficult to implement consistently without surprises: if for example I get a value from a memory cache with a sliding expiration that would extend the duration, but should that also apply to the distributed one? At every call? From every node (think a multi-nodes setup)?

In FusionCache what I've done is not give support for sliding expiration, but allow to basically have the same end result (something that is frequently used is basically always available) thanks to features like fail-safe, soft/hard timeouts and eager refresh. This would add some calls to the database every now and then, but is has the extra advantage that the data is always fresh even when frequently used, something that with sliding expiration may not happen.

I'd suggest thinking very well before committing to offer a Refresh method to a multi-level cache to avoid havign to handle surprises down the road.

Default entry options

There's have been a mention of default entry options, so that users can call the methods without having to specify them every time: in FusionCache I have this concept explicitel, namely a DefaultEntryOptions in the global per-cache FusionCacheOptions object. This worked very well, so I'd suggest to go with it. In FusionCache there's also the possibility to specify options per-call with a options => options.SetWhatever(...) lambda, so that it duplicates the DefaultEntryOptions and allow for per-call customization. It's nice and very handy, but watch out for the growing number of overloads combinations needed to support (ask me how I know that), so keep that in mind.

Naming of GetOrCreate[Async]

I like consistency and uniformity. Like, a lot. But something to to point out is that the (ext) method with the same name in MemoryCache does not in fact protect from Cache Stampede. I know you all know this, but it's just something to think about to avoid new users thinking the 2 methods with the same name act the same.

Maybe just a note in the docs and xml comments, something very visible? Just pointing this out.

Byte[] -> Buffers/Span/etc

Very good! This limitation in the api surface area of IDistributedCache did not allow for better perf, so I'm all for it, and if I got this right the impl would check for something like if (cache is IBufferDistributedCache buffered) and use that, which I think is great.

Serializer with buffers

Same thing, also very good!

Two questions though:

What was the rationale for them? I don't know how useful in practice, but I'm just thinking about scenarios with big payloads: it happened to me to have to handle like a 1MB json cache entry, so I was just wondering.

Dependency Injection, builder and multiple caches

Initially I took the path of using the the straightforward path of "ust using DI resolution, so register an IFusionCache service and be good to go. But then the necessity came - both from myself as a user of FusionCache, and from the community - for multiple named caches, identifiable by name and configured differently.

In FusionCache I solved this by taking a similar approach as the one with named http clients in .net, since it was reasonable and the community could be used to it. Maybe now with keyed services in DI there's a different way, so I'd suggest looking into that, too.

So for the high level read through cache I suggest looking from the get go to support this approach, too.

Watch out though for cachekey collisions for different caches that shares the same underlying Redis instance, for example: we can use a different database index, but that does not work in cluster mode, so usually the way is a cache key prefix, specifiable per-cache.

Also, if there are different things to configure per-cache, a builder approach to me worked quite well, and it now allows me to add new stuff without creating a ton of new ctor overloads or stuff like that.

Serializers per-type

Eh, this does not sound that good to me. Let me explain: as a user of a caching solution a lot of times you need to add data of new types to the cache, and having to create new serializers every time is probably not the best dev experience. I see that you are thinking about using an attribute maybe, something like [DataContract] or even a new one: although this may make things easier, it means that it would be a leaky abstraction where a caching concern would leak to the models. Also, the time will come when users would not be able to touch an existing class just to be able to cache it, and in that case they would need to create or extend a custom serializer for those cases, getting to a pollution of serializer factories probably.

Also a frequently used serialization format, although we know it's not optimized for that, is JSON: by using that we can serialize and deserialize (almost) whatever we want without worrying about specific types and because of that the dev experience is nice and is used a lot, so I would push for that kind of "it just works" dev experience for a broader adoption.

Btw maybe I'm missing something here and the plumbing needed would be less than what I'm anticipating.

Another point is that (in my experience at least) it's quite rare to want different serialization formats in the same cache for different object types: in fact an alternative and pragmatic solution (which is what I took in FusionCache) is to just specify one serializer per cache. With the multiple named caches I talked about above this allows users to configure what they want without the excess of needing a serializer per-type.

Finally, I would not auto-register the STJ serializer automatically, but in my experience is better to be explicit, otherwise you then need to have an un-register method or something like that: my 2 cents is just make it dead easy to use that (AddJsonSerializer() or something like that) so that the decision is explicit and without doubts for the users.

Different IDistributedCache implementations

Minor, but since you listed some of the existing ones you may find it useful to see all the ones I've found available here.

One of them to take a look at is this one for SQLite: although it's useless in multi-node scenarios it can be very useful in standalone mobile/desktop apps, where it will ease the cold start problem. Maybe keep in touch with the authors to have them ready for the launch, some of my users are using it in a mobile game and have said great things.

Eager Prefetch

I have the same feature in FusionCache called Eager Refresh and it works well, so definitely go for it!

A couple of things to look out for.

In my case I opted for an option expressed as a % of the duration, for the reasons explained in the docs,so maybe think about it

Also I had some problems with EFCore (both for Eager Refresh and for Soft Timeouts) because of the thing you already mentioned, which is the lifetime of the DBContext. I'd like to ask to the EFCore team if it would be possible to have something like a "deferred dispose" (meaning that when Dispose is called on the context if it sees that there's a query running it will wait for it to finish before actually disposing it), but I think it would not be easy to implement because of various ramifications, and also I haven't had the time yet to open an issue in their repo. FWIW the solution I proposed to my community worked well, but it's not an "it just works" kind of thing and maybe somewhat cumbersome for some, so that is something to watch out for and maybe work with EFCore team to come up with something and be prepared (because, I can tell you, people will complain about that šŸ˜…).

Tagging

This, so much this!

This is one of the features that I would've liked to have at my disposal, because it unlocks so many different scenarios I'm thinking about from a lot of time. BUT:

Just to stress this, I think this is so important (imho) that I've been thinking for a long time now to create a new IFusionCacheLayer abstraction for an hypothetical next version of FusionCache just to be able to have tagging: the problem is that I would loose the advantage of using existing IDistributedCache impls, so it's something I've deferred until now while doing other useful features.

I can think of various limitations with the number of tags supported, but capping it at some reasonable number would not make it useless, I think: on the other hand I would say supporting only 1 tag is something not to look into. Either go for real multi tags or don't bother doing it, it's not worth it (again, imho).

Multi GET/SET

This is another request I got different times, so I'm sharing this. The ability to avoid the dreaded SELECT N + 1 problem would be really nice, BUT at the same time it may take some time to clearly define both the api surface area (both input and output... dictionary?) and what should happen in case some cache keys are not there. Also, watch out for cache key prefixes and similar thing, because if you have Get("foo", ...) and a cache key prefix of "bar:" the actual cache key sent to the actual cache storage will be "bar:foo", but then in the resulting dictionary (or whatever) the original "foo" should be used, in the context of a multi-get scenario.

Also, in a multi-level cache, what happens when the user asks for 10 cache keys and 5 are L1 and 5 are not? We may decompose the list of cache keys and try to grab from L2 only the missing ones, etc. I've done similar things in the past on different projects, and honestly with great success, BUT it's something probably to avoid for a v1 because there are a lot of edge cases and niche situations to take into account.

L1 always there

Would the L1 be always there? Is there a way to have only L2 instead of L1 + L2?

I'm asking because I've been there and people asked for FusionCache to be distributed only, and based on how you'll implement cache stampede protection that may be a problem, or not.

The dreaded Clear() method

I personally don't like a Clear() method, because it's not as easy to get it right as it may seems (eg: cache key prefix, memory vs distributed, etc).

Having said that I'm just warning you that the request will come, like I'm 99% sure about this, so at least think about the answer šŸ˜…

Group based cache invalidation

Without tagging (if that would be the case), whenever I needed something like this I ended up doing some variation of what is normally called key-based cache expiration.

It's not a silver bullet, it does not work all the time and in all scenarios, but it's just something to keep in mind. Without tagging you may think about suggesting an alternative.

Entry Options

I'm seeing you want to keep using the DistributedCacheEntryOptions type: are you planning to extend it with the missing options? I'm asking because I ended up adding more and more stuff as I added features, so that would possibly happen here, too. The good thing is it's a class, so you can add stuff without necessarily a breaking change (maybe just new ctors and some default values).

Remote Invalidations

The IDistributedCacheInvalidation seems very minimal, and I don't know if you'll be able to pull this off with such a little api surface area.

But right now I can't say anything more specific, it's just a feeling, I have to see it in action or wait a little bit more.

For FusionCache I created the Backplane, which has a little bit wider api (but not that much tbh), so maybe I'm a bit biased.

Various implementations of new IDistributedCache extended interfaces

A good rule of thumb I saw along the years when creating new abstractions is that you need to have at least 3 quite different implementations working: that is a kind of a magic number that does not guarantee 100% succes, of course, but at least gives you a very good shot a being right.

In this case I would suggest being ready from day 1 with 4 impls: 2 from Microsoft (like Redis + SqlServer or CosmosDB, different technologies but both "internal" for a fast iteration) + 2 external (like Memcached + SQLite, with different constraints but still quite useful).

Again, my 2 cents.

About this and the future of FusionCache

As you said on Twitter I can see FusionCache being one of the available implementations of IReadThroughCache (or whatever name will be at the end), maybe even from day 1.

In this sense I'd be glad to try and see if and how a 3rd party caching solution can get into this new abstraction.

Maybe something like this?

services.AddFusionCache()
  .WithOptions(options => {
    // ...
  })
  .WithDistributedCache(...)
  .WithBackplane(...)
  .AsReadThroughCache(); // THIS

So basically, count me interested: I'll gladly see what I can do!

Microsoft and OSS

I can see MS coming out with this new higher level cache thing and people ignoring other 3rd party solutions, or maybe stifling innovation with a 100 tons common denominator abstraction (like in the DI/logging world) which... yeah, I mean that's what happened there, so there's definitely a risk here. I can see a potential future where any new feature I'd add to FusionCache would need to be hidden behind this new abstraction because people would not like to depend on FusionCache as the abstraction, but only as an implementation.

But honestly I can also see a value in a shared abstraction, and I don't have an asnwer ready for this that would solve this situation.

Maybe, just maybe, here's what may happen:

I don't know, maybe I'm dreaming.

Conclusions

So yeah, I had a lot of stuff I wanted to share with you, I hope you'll find at least some of these useful. I know it's a lot to take, but better now than too late, methinks.

Thanks again for sharing this journey with us and being open to suggestions.

brockallen commented 3 months ago

Hi all, I've finally had time to read through the proposal and all the comments, and wow a lot of food for thoughts šŸ˜…

I'm pretty sure your post is enough all by itself to renew your MVP, @jodydonetti. :)

mgravell commented 3 months ago

@jodydonetti That's a heck of a lot of words, and all great feedback from someone directly and appropriately relevant! I'm going to carve out a chunk of time tomorrow to go though it line by line, but from a first glance I think we're more or less on the same page on the key topics, which sounds great. A few more nuanced thoughts occurred, but I'll save them for tomorrow. Thanks for the great involvement :+1:

andreaskromann commented 3 months ago

Background

I maintain an internally developed caching framework with a similar feature set to FusionCache. This framework is used on a number of ecommerce solutions we develop. I started developing this caching framework before FusionCache existed. If I had to start over today, I would probably just use FusionCache instead, because getting this stuff right is a lot of work.

One of the main inspirations back then was this blog post by Nick Craver (of course one of @mgravell old/current? colleagues at Stack Overflow): Nick Craver - Stack Overflow: How We Do App Caching - 2019 Edition. The outlined functionality around a "stale time" and refreshing in the background is something we use extensively today.

Below is my feedback on what I have seen in this thread so far.

MVP

I would absolutely love to be able to replace what we use today with something directly out of .NET. A minimum requirement for us would be the L1/L2-feature with L1-invalidation across nodes. We need that feature before we can use this new cache.

Bulk operations

We have bulk operations for Get, GetOrCreate, Set, Remove. For GetOrCreate it has the benefit that you can fetch the data from the source in one round-trip, which is a huge win. For Set & Remove it has the benefit, that you can optimize the invalidation messages and bulk keys together instead of sending a message per key.

We optimized the Get and GetOrCreate to fetch what they can from L1, then try the remaining from L2 and then finally the data source. As @jodydonetti said above there are some pitfalls here though, so it might not be in scope for a v1.

Memory / L1 usage

This is more of a nice-to-have. We often have the pattern that some nodes write to the cache and others read. On the write-nodes it would be nice to have the option to not insert values into L1 and only write to L2. On the read-nodes it would be nice to have to option to eagerly re-fetch the invalidated key in order to keep the value in L1 (if possible).

Background Refresh Concurrency

One thing to be mindful of when implementing background refresh is, you can effectively create another stampede issue, when a lot of values want to refresh at the same time. I control the concurrency of the background refreshes by using the TPL Dataflow library from .NET.

Resilience

Something I haven't seen mentioned yet is resilience. We have built-in retry-logic with exponential back-off. This is implemented using Polly. Having something like this built in the same way we have for HttpClient in .NET 8 could be valuable, but more nice-to-have.

Observability

Support for traces, metrics & logging - maybe even OpenTelemetry. It is important to be able to see if your caching infrastructure is healthy, and when it's not healthy to get data on what is wrong.

Clear()

I completely agree with @jodydonetti on this one. I never gave that option to the cache users, because it can be quite the footgun, if you use it outside dev/test-environments.

Group based cache invalidation

The pattern where you want to cache values in one way for reads and update them in another way as a group is quite common. Our developers also had this need.

We thought a lot about how to support this use case. Maintaining a set of cache keys per tag didn't seem viable to us. We could always think of concurrency issues in those type of solutions. There are some tricky corner cases to handle when you invalidate/update tags at the same time you invalidate/update cache values.

What we came up with instead is a solution that is probably not for everyone, but is good enough for us. I will briefly outline it here for inspiration since it seems to be a topic of interest in this thread.

  1. We store tags directly in the cache values. We already wrapped the cache values to store some meta data, so this wasn't new to us.
  2. For each tag we store an invalidation time. We use the existing cache infrastructure for this. When you invalidate by tag we simply update the invalidation time.
  3. On cache reads, if the cache value has tags we check the tag's invalidation times against the cache value's creation time, and declare the cache value expired if any tag's invalidation time is greater than the cache value's creation time.

We make no attempt to actually remove the values that are invalidated by tag. We just let them expire and get cleaned up by the normal mechanisms. So that is one obvious downside of this solution, but one we are willing to live with. Another is you need to have your clocks in sync! But that's important in all distributed systems anyway.

A very good thing about this solution is that it is very simple to implement and reason about. We just use the existing infrastructure, we already have.

mgravell commented 3 months ago

@andreaskromann your notes on tagging are particularly useful here; I noted in the intro that we would like to support compression, but since there isn't a raw metadata concept in IDiatributedCache, that means we would need to use some kind of framing/header (to indicate compressed/non-compressed, compression-type, etc); if we think we will need such a header, maybe we bite the bullet and just go that route with tags+timestamp in the header too, which would allow us to implement lazy tagging. For example, consider the header:

{sentinel prefix value, maybe 4 known bytes}
{bit flags, varint-encoded}
[if bit flag 1==tagged set]
    {timestamp, some fixed bytes}
    {tag count, varint-encoded}
        {tag 0 byte length, varint-encoded}
        {tag 0 bytes, utf8 if string}
        ...
        [tag N byte length, varint-encoded]
        [tag N bytes, utf8 if string}
{actual payload, once header exhausted}

The sentinel prefix is to prevent problems with pre-existing data that isn't header-prefixed; the chances of the prefix occuring naturally is acceptably small. Values without this sentinel are interpreted as raw payloads. Unexpected bit flag combinations (during upgrades using new features not yet deployed to all nodes) should be interpreted as unreadable and count as a cache miss, since we cannot be sure we are interpreting the data correctly.

Some number of bits would be reserved to be the compression format, 0 being non-compressed

This is a bit more complex, but much more powerful, and allows us to implement tagging and compression reliably inside IDistributedCache

The main problem I can see with this is that it then becomes harder to offer full feature parity with FusionCahe, meaning: if we offer a "tags" parameter on the IReadThroughCache API, and FusionCache currently had no support for tags, it puts FusionCache in the awkward spot of either a: saying "we will silently ignore tags" or b: having to retrofit tags into a pre-existing system

We'd still lack an external catalogue of timestamp expirations (for cold start) so if we went that route we'd probably need to extend IDistributedCache again to support that. Plus we'd need to integrate tag invalidation into the invalidation API (the backplane, in FusionCache terms).

Not saying we can/will do any of this - just sharing my thoughts as they occur.

jodydonetti commented 3 months ago

If I had to start over today, I would probably just use FusionCache instead, because getting this stuff right is a lot of work.

Glad to hear!

Memory / L1 usage

This is more of a nice-to-have. We often have the pattern that some nodes write to the cache and others read. On the write-nodes it would be nice to have the option to not insert values into L1 and only write to L2. On the read-nodes it would be nice to have to option to eagerly re-fetch the invalidated key in order to keep the value in L1 (if possible).

Yep, I've been there too, and in FusionCache I solved it by having a SkipMemoryCache option after some community member explained the same scenario you outlined. I was hesitant at first, because without an L2 configured if you skip L1 you are not protected anymore from cache stampede which can be tricky to understand when falling into that (but it depends on how you've implemented cache stampede protection).

Background Refresh Concurrency

One thing to be mindful of when implementing background refresh is, you can effectively create another stampede issue...

Ah yes, good point I forgot to mention. In FusionCache I internally use the same mechanism for both a normal refresh cycle and for eager refresh, meaning that 1) no 2 eager refresh can run at the same time (for the same cache key) and 2) if a request coming in starts an eager refresh that lasts for, say, 1 min and in that minute the cache entry expires, the following calls would wait for the eager one to finish instead of having 2 separate refresh running (eager + normal).

Good point!

Resilience

Something I haven't seen mentioned yet is resilience. We have built-in retry-logic with exponential back-off. This is implemented using Polly. Having something like this built in the same way we have for HttpClient in .NET 8 could be valuable, but more nice-to-have.

Just to add to this: in FusionCache I decided to leave any local retry logic to the user, by using in their factory method Polly or whatever they want. Having said that, I added a simple circuit breaker directly inside FusionCache (opt-in + configurable) for the distributed components (distributed cache, backplane, etc) so that FusionCache can actively know that those components are temporarily not available, and reason about it.

Observability

Support for traces, metrics & logging - maybe even OpenTelemetry. It is important to be able to see if your caching infrastructure is healthy, and when it's not healthy to get data on what is wrong.

Agree: in low-level caching this is probably overkill, but in high-level/app-level caches this would be very important for when we go into detective mode to investigate issues. I was about to suggest "maybe not in v1" here, but honestly adding Open Telemetry in FusionCache has been so quick and easy (after an initial study on how to correctly define naming conventions, tags with cardinality, etc) that maybe it's worth to give it a try from the get go.

Group based cache invalidation

[...] What we came up with instead is a solution... [...] We make no attempt to actually remove the values that are invalidated by tag. We just let them expire...

I played with a similar idea in the past, but never fully designed it 100%.

The basic idea to "let stuff expire" and have some sort of "client side timestamp-based filter" is something definitely to explore I think. It would allow for the "outside world" to effecticely have the same results, even if "on the inside" it's still there..

It's similar to how I implemented fail-safe, where cache entries expire later but there's a LogicalExpiration prop that serves as a sort of "low pass filter".

Worth a shot experimenting with it.

Another is you need to have your clocks in sync! But that's important in all distributed systems anyway.

Eh, this is the elephant in the room in distributed systems: clock shifting, clock skewing and clock drifting are something to consider, even though pragmatically in most cases you can play the game of pretending they do not exists šŸ˜…

Thanks for sharing!

jodydonetti commented 3 months ago

... that means we would need to use some kind of framing/header (to indicate compressed/non-compressed, compression-type, etc)

Just to understand this better: you mean that serializers will "just" serialize the payload normal payload and the new cache infrastructure will add an header in front, by "combining" the 2 byte[] (or buffers, etc)?

I'm asking because I can see 2 different steps: an "internal" low level handling of the header bits & bytes which the different serializers may not need to be able to handle, and a "normal" payload serialization done by the serializer.

Is this the idea?

The main problem I can see with this is that it then becomes harder to offer full feature parity with FusionCahe...

FWIW I'd say don't worry: this new effort is too important for it to be blocked from being developed because of a 3rd party project like mine.

If the feature is really needed and the design is right, I'd say go for it: it will up to me to see if and how to fit that into a future FusionCache version.

jodydonetti commented 3 months ago

Oh, also. good point about thinking upfront about the evolution of the protocol version!

In FusionCache I ended up doing this: maybe it can be an idea to think about, since there different possible solutions like an evolvable protocol, different key-prefix, etc and all have their pros and cons.

Happy to deepen the subject if needed.