[API Proposal]: Introduce new memory cache library

Background and motivation

We have two available memory cache libraries in .NET that are popular and well known - System.Runtime.Caching and Microsoft.Extensions.Memory.Cache. As already described in this issue there is a room for improvement in their public APIs. There are also efficiency problems that hit at high scale. Thus, our goal was to implement as efficient memory cache as possible for high concurrency and high traffic servers. During the process we've found a few additional opportunities to do slightly better than existing libraries.

Our implementation is fully generic for both keys and values, offering greater convenience and potentially improving memory efficiency by avoiding unnecessary boxing and type conversions.
Our implementation delivers noticeable performance improvements, with reduced latency and greater efficiency across operations.
Both System.Runtime.Caching and Microsoft.Extensions.Caching.Memory use mechanisms that may involve background threads or timers for managing expiration. In contrast, our implementation maintains state directly within set and get methods, avoiding thread pool overhead. Since it doesn't implement IDisposable, there's also no risk of memory leaks from incorrect disposal, unlike System.Runtime.Caching, which allocates timers on construction.
The API design can lead to inefficiencies by requiring heap-allocated objects, potentially increasing memory pressure at runtime.
No metrics are emitted by default, which is important for monitoring in distributed systems.

RCache implementation is based on open source library BitFaster.

Storing pair of <int, int> (this is the best situation for RCache since no boxing on key and value). The latency includes optional cost of maintaining telemetry state on RCache side.

Cache library	Remove	TryGetMiss	TryGetHit	GetOrSet	GetOrSet Dynamic TTL	Set	SetDynamicTTL
RCache	6.5 ns	10.0 ns	11.4 ns	14.7 ns	15.6 ns	28.9 ns	32.2 ns
Microsoft.Extensions.Caching.Memory	59.3 ns	39.0 ns	48.4 ns	52.1 ns	44.9 ns	125.5 ns	130.2 ns
System.Runtime.Caching	59 ns	54.8 ns	107.0 ns	175.8 ns	239.7 ns	1192.5 ns	1216.1 ns

Feature comparison table:

Feature	RCache	Microsoft.Extensions.Caching.Memory	System.Runtime.Caching
Time-based eviction	yes	yes	yes
Sliding time to evict	yes	yes	yes
Callbacks on eviction	no	yes	yes
Metrics	yes	no	no
Named caches	yes	no	no
Generics support	yes	no	no
Priority based eviction	yes**	yes	no
Runtime entry size calculation	no	yes	no
Dynamic Time To Evict	yes	yes	yes
Item update notification	no	yes	no

** Algorithm we use has a notion of three priorities (hot, warm, cold) and respect them while rotating items. Though, we don't allow to define priorities by customer or have direct control over it.

API Proposal

namespace Microsoft.Extensions.Cache.Memory;

/// <summary>
/// A synchronous in-memory object cache.
/// </summary>
/// <typeparam name="TKey">Type of keys stored in the cache.</typeparam>
/// <typeparam name="TValue">Type of values stored in the cache.</typeparam>
public abstract class RCache<TKey, TValue> : IEnumerable<KeyValuePair<TKey, TValue>>
    where TKey : notnull
{
    /// <summary>
    /// Gets the name of the cache instance.
    /// </summary>
    /// <remarks>
    /// This name is used to identity this cache when publishing telemetry.
    /// </remarks>
    public abstract string Name { get; }

    /// <summary>
    /// Gets the capacity of the cache, which represents the maximum number of items maintained by the cache at any one time.
    /// </summary>
    public abstract int Capacity { get; }

    /// <summary>
    /// Tries to get a value from the cache.
    /// </summary>
    /// <param name="key">Key identifying the requested value.</param>
    /// <param name="value">Value associated with the key or <see langword="null"/> when not found.</param>
    /// <returns>
    /// <see langword="true"/> when the value was found, <see langword="false"/> otherwise.
    /// </returns>
    public abstract bool TryGet(TKey key, [MaybeNullWhen(false)] out TValue value);

    /// <summary>
    /// Sets a value in the cache.
    /// </summary>
    /// <remarks>
    /// The value's time to expire is set to the global default value defined for this cache instance.
    /// </remarks>
    /// <param name="key">Key identifying the value.</param>
    /// <param name="value">Value to associate with the key.</param>
    public abstract void Set(TKey key, TValue value);

    /// <summary>
    /// Sets a value in the cache.
    /// </summary>
    /// <remarks>
    /// After time to expire has passed, a value is not retrievable from the cache.
    /// At the same time cache might keep the root for it for some time.
    /// </remarks>
    /// <param name="key">Key identifying the value.</param>
    /// <param name="value">Value to cache on the heap.</param>
    /// <param name="timeToExpire">Amount of time the value is valid, after which it should be removed from the cache.</param>
    /// <exception cref="ArgumentOutOfRangeException">If <paramref name="timeToExpire"/> is less than 1 millisecond.</exception>
    public abstract void Set(TKey key, TValue value, TimeSpan timeToExpire);

    /// <summary>
    /// Gets a value or sets it if doesn't exist.
    /// </summary>
    /// <remarks>
    /// The value's time to expire is set to the global default value defined for this cache instance.
    /// </remarks>
    /// <typeparam name="TState">Type of the state passed to the function.</typeparam>
    /// <param name="key">Key identifying the value.</param>
    /// <param name="state">State passed to the factory function.</param>
    /// <param name="factory">A function used to create a new value if the key is not found in the cache. It returns the value to be cached.</param>
    /// <returns>
    /// Data retrieved from the cache or created by the passed factory function.
    /// </returns>
    public abstract TValue GetOrSet<TState>(TKey key, TState state, Func<TKey, TState, TValue> factory);

    /// <summary>
    /// Gets a value or sets it if doesn't exist.
    /// </summary>
    /// <remarks>
    /// After time to expire has passed, a value is not retrievable from the cache.
    /// At the same time cache might keep the root for it for some time.
    /// </remarks>
    /// <typeparam name="TState">The type of the state object passed to the factory function.</typeparam>
    /// <param name="key">The key identifying the cached value.</param>
    /// <param name="state">An additional state object passed to the factory function.</param>
    /// <param name="factory">
    /// A function used to create a new value if the key is not found in the cache. The function returns a tuple where the 
    /// first item is the value to cache, and the second item is a <see cref="TimeSpan"/> representing the duration for which 
    /// the value remains valid in the cache before expiring.    
    /// </param>
    /// <returns>
    /// The value associated with the key, either retrieved from the cache or created by the factory function.
    /// </returns>
    public abstract TValue GetOrSet<TState>(TKey key, TState state,
        Func<TKey, TState, (TValue value, TimeSpan timeToExpire)> factory);

    /// <summary>
    /// Gets a value or sets it if doesn't exist.
    /// </summary>
    /// <remarks>
    /// The value's time to expire is set to the global default value defined for this cache instance.
    /// </remarks>
    /// <param name="key">Key identifying the value.</param>
    /// <param name="factory">A function used to create a new value if the key is not found in the cache. It returns the value to be cached.</param>
    /// <returns>
    /// Data retrieved from the cache or created by the passed factory function.
    /// </returns>
    public abstract TValue GetOrSet(TKey key, Func<TKey, TValue> factory);

    /// <summary>
    /// Gets a value or sets it if doesn't exist.
    /// </summary>
    /// <remarks>
    /// After time to expire has passed, a value is not retrievable from the cache.
    /// At the same time cache might keep the root for it for some time.
    /// </remarks>
    /// <param name="key">Key identifying the value.</param>
    /// <param name="factory">
    /// A function used to create a new value if the key is not found in the cache. The function returns a tuple where the 
    /// first item is the value to cache, and the second item is a <see cref="TimeSpan"/> representing the duration for which 
    /// the value remains valid in the cache before expiring.
    /// </param>
    /// <returns>
    /// Data retrieved from the cache or created by the passed factory function.
    /// </returns>
    public abstract TValue GetOrSet(TKey key, Func<TKey, (TValue value, TimeSpan timeToExpire)> factory);

    /// <summary>
    /// Gets the current number of items in the cache.
    /// </summary>
    /// <returns>
    /// This method is inherently imprecise as threads may be asynchronously adding
    /// or removing items. In addition, this method may be relatively costly, so avoid
    /// calling it in hot paths.
    /// </returns>
    public abstract int GetCount();

    /// <summary>
    /// Attempts to remove the value that has the specified key.
    /// </summary>
    /// <param name="key">Key identifying the value to remove.</param>
    /// <returns>
    /// <see langword="true"/> if the value was removed, and <see langword="false"/> if the key was not found.
    /// </returns>
    public abstract bool Remove(TKey key);

    /// <summary>
    /// Attempts to remove the specified key-value pair from the cache.
    /// </summary>
    /// <param name="key">Key identifying the value to remove.</param>
    /// <param name="value">The value associated with the key to remove.</param>
    /// <returns>
    /// <see langword="true"/> if the specified key-value pair was found and removed; 
    /// otherwise, <see langword="false"/>.
    /// </returns>
    /// <remarks>
    /// This method checks both the key and the associated value for a match before removal.
    /// If the specified key exists but is associated with a different value, the cache remains unchanged.
    /// </remarks>
    public abstract bool Remove(TKey key, TValue value);

    /// <summary>
    /// Removes all expired entries from the cache.
    /// </summary>
    /// <remarks>
    /// Some implementations perform lazy cleanup of cache resources. This call is a hint to
    /// ask the cache to try and synchronously do some cleanup.
    /// </remarks>
    public abstract void RemoveExpiredEntries();

    /// <summary>
    /// Returns an enumerator that iterates through the items in the cache.
    /// </summary>
    /// <returns>
    /// An enumerator for the cache.
    /// </returns>
    /// <remarks>
    /// This can be a slow API and is intended for use in debugging and diagnostics, avoid using in production scenarios.
    /// 
    /// The enumerator returned from the cache is safe to use concurrently with
    /// reads and writes, however it does not represent a moment-in-time snapshot.
    /// The contents exposed through the enumerator may contain modifications
    /// made after <see cref="GetEnumerator"/> was called.
    /// </remarks>
    public abstract IEnumerator<KeyValuePair<TKey, TValue>> GetEnumerator();

    /// <summary>
    /// Returns an enumerator that iterates through the items in the cache.
    /// </summary>
    /// <returns>
    /// An enumerator for the cache.
    /// </returns>
    /// <remarks>
    /// This can be a slow API and is intended for use in debugging and diagnostics, avoid using in production scenarios.
    /// 
    /// The enumerator returned from the cache is safe to use concurrently with
    /// reads and writes, however it does not represent a moment-in-time snapshot.
    /// The contents exposed through the enumerator may contain modifications
    /// made after <see cref="GetEnumerator"/> was called.
    /// </remarks>
    IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
}

/// <summary>
/// Options for LRU (Least Recently Used) implementations of <see cref="RCache{TKey, TValue}"/>.
/// </summary>
/// <typeparam name="TKey">Type of keys stored in the cache.</typeparam>
public class RCacheLruOptions<TKey>
    where TKey : notnull
{
    /// <summary>
    /// Gets or sets the maximum number of items that can be stored in the cache.
    /// </summary>
    /// <value>
    /// Defaults to 1024.
    /// </value>
    [Range(3, int.MaxValue - 1)]
    public int Capacity { get; set; } = 1024;

    /// <summary>
    /// Gets or sets the default time to evict individual items from the cache.
    /// </summary>
    /// <remarks>
    /// This value is used by methods which do not accept an explicit time to evict parameter.
    /// If you don't want your items to be ever evicted due to time, set this value to <see cref="TimeSpan.MaxValue"/>.
    /// </remarks>
    /// <value>
    /// Defaults to 5 minutes.
    /// </value>
    [TimeSpan(minMs: 1)]
    public TimeSpan DefaultTimeToEvict { get; set; } = TimeSpan.FromMinutes(5);

    /// <summary>
    /// Gets or sets the amount of time by which an items's eviction time is extended upon a cache hit.
    /// </summary>
    /// <remarks>
    /// This value is ignored when <see cref="ExtendTimeToEvictAfterHit"/> is <see langword="false"/>.
    /// </remarks>
    /// <value>
    /// Defaults to 5 minutes.
    /// </value>
    [TimeSpan(minMs: 1)]
    public TimeSpan ExtendedTimeToEvictAfterHit { get; set; } = TimeSpan.FromMinutes(5);

    /// <summary>
    /// Gets or sets a value indicating whether an item's time to evict should be extended upon a cache hit.
    /// </summary>
    /// <value>
    /// Defaults to <see langword="false"/>.
    /// </value>
    public bool ExtendTimeToEvictAfterHit { get; set; }

    /// <summary>
    /// Gets or sets the cache's level of concurrency.
    /// </summary>
    /// <remarks>
    /// Increase this value if you observe lock contention.
    /// </remarks>
    /// <value>
    /// Defaults to <see cref="Environment.ProcessorCount"/>.
    /// </value>
    [Range(1, int.MaxValue)]
    public int ConcurrencyLevel { get; set; } = Environment.ProcessorCount;

    /// <summary>
    /// Gets or sets the custom time provider used for timestamp generation in the cache.
    /// </summary>
    /// <remarks>
    /// If this value is <see langword="null"/>, the cache will default to using <see cref="Environment.TickCount64"/>
    /// for timestamp generation to optimize performance. If a <see cref="TimeProvider"/> is set, 
    /// the cache will call the <see cref="TimeProvider.GetTimestamp"/> method.
    /// The <see cref="TimeProvider"/> should primarily be used for testing purposes, where custom time manipulation is required.
    /// </remarks>
    /// <value>
    /// Defaults to <see langword="null"/>.
    /// </value>
    public TimeProvider? TimeProvider { get; set; }

    /// <summary>
    /// Gets or sets the comparer used to evaluate keys.
    /// </summary>
    /// <value>
    /// Defaults to <see cref="EqualityComparer{T}.Default"/>,
    /// except for string keys where the default is <see cref="StringComparer.Ordinal"/>.
    /// </value>
    public IEqualityComparer<TKey> KeyComparer { get; set; }
        = typeof(TKey) == typeof(string) ? (IEqualityComparer<TKey>)StringComparer.Ordinal : EqualityComparer<TKey>.Default;

    /// <summary>
    /// Gets or sets a value indicating how often cache metrics are refreshed.
    /// </summary>
    /// <remarks>
    /// Setting this value too low can lead to poor performance due to the overhead involved in
    /// collecting and publish metrics.
    /// </remarks>
    /// <value>
    /// Defaults to 30 seconds.
    /// </value>
    [TimeSpan(min: "00:00:05")]
    public TimeSpan MetricPublicationInterval { get; set; } = TimeSpan.FromSeconds(30);

    /// <summary>
    /// Gets or sets a value indicating whether metrics are published or not.
    /// </summary>
    /// <value>
    /// Defaults to <see langword="true"/>.
    /// </value>
    public bool PublishMetrics { get; set; } = true;
}

/// <summary>
/// Builder for creating <see cref="RCache{TKey, TValue}"/> instances.
/// </summary>
/// <typeparam name="TKey">Type of keys stored in the cache.</typeparam>
/// <typeparam name="TValue">Type of values stored in the cache.</typeparam>
public class RCacheLruBuilder<TKey, TValue>
    where TKey : notnull
{
    /// <summary>
    /// Initializes a new instance of the <see cref="RCacheLruBuilder{TKey,TValue}"/> class.
    /// </summary>
    /// <param name="name">Name of the cache, used in telemetry.</param>
    /// <exception cref="ArgumentNullException">Thrown when the <paramref name="name"/> is null.</exception>
    public RCacheLruBuilder(string name);

    /// <summary>
    /// Sets the options for the cache.
    /// </summary>
    /// <param name="options">Cache options.</param>
    /// <returns>The current instance of the <see cref="RCacheLruBuilder{TKey,TValue}"/>.</returns>
    /// <exception cref="ArgumentNullException">Thrown when the <paramref name="options"/> is null.</exception>
    public RCacheLruBuilder<TKey, TValue> WithOptions(RCacheLruOptions<TKey> options);

    /// <summary>
    /// Sets the meter factory for the cache.
    /// </summary>
    /// <param name="meterFactory">Meter factory for telemetry.</param>
    /// <returns>The current instance of the <see cref="RCacheLruBuilder{TKey,TValue}"/>.</returns>
    public RCacheLruBuilder<TKey, TValue> WithMeterFactory(IMeterFactory? meterFactory);

    /// <summary>
    /// Builds the <see cref="RCache{TKey, TValue}"/> instance with the specified configurations.
    /// </summary>
    /// <returns>A ready-to-use <see cref="RCache{TKey, TValue}"/> instance.</returns>
    /// <exception cref="OptionsValidationException">Thrown when the validation of options fails.</exception>
    /// <exception cref="InvalidOperationException">Thrown when the meter factory is null but metrics publishing is enabled.</exception>
    public RCache<TKey, TValue> Build();
}

/// <summary>
/// Extension methods for caching.
/// </summary>
public static class RCacheExtensions
{
    /// <summary>
    /// Adds an LRU (Least Recently Used) cache to the dependency injection container.
    /// </summary>
    /// <typeparam name="TKey">Type of keys stored in the cache.</typeparam>
    /// <typeparam name="TValue">Type of values stored in the cache.</typeparam>
    /// <param name="services">Dependency injection container to add the cache to.</param>
    /// <returns>The value of<paramref name="services"/>.</returns>
    /// <exception cref="ArgumentNullException">When passed <paramref name="services"/> are <see langword="null"/>.</exception>
    public static IServiceCollection AddRCacheLru<TKey, TValue>(this IServiceCollection services)
        where TKey : notnull;

    /// <summary>
    /// Adds a named LRU (Least Recently Used) cache to the dependency injection container.
    /// </summary>
    /// <typeparam name="TKey">Type of keys stored in the cache.</typeparam>
    /// <typeparam name="TValue">Type of values stored in the cache.</typeparam>
    /// <param name="services">Dependency injection container to add the cache to.</param>
    /// <param name="name">Name of the cache, used in telemetry.</param>
    /// <returns>The value of<paramref name="services"/>.</returns>
    /// <exception cref="ArgumentNullException">When passed <paramref name="services"/> are <see langword="null"/>.</exception>
    /// <exception cref="ArgumentException">When passed <paramref name="name"/> is <see langword="null"/> or empty.</exception>
    public static IServiceCollection AddRCacheLru<TKey, TValue>(this IServiceCollection services, string name)
        where TKey : notnull;

    /// <summary>
    /// Adds an LRU (Least Recently Used) cache to the dependency injection container.
    /// </summary>
    /// <typeparam name="TKey">Type of keys stored in the cache.</typeparam>
    /// <typeparam name="TValue">Type of values stored in the cache.</typeparam>
    /// <param name="services">Dependency injection container to add the cache to.</param>
    /// <param name="configure">A function used to configure cache options.</param>
    /// <returns>The value of<paramref name="services"/>.</returns>
    /// <exception cref="ArgumentNullException">When passed <paramref name="services"/> or <paramref name="configure"/> are <see langword="null"/>.</exception>
    public static IServiceCollection AddRCacheLru<TKey, TValue>(this IServiceCollection services, Action<RCacheLruOptions<TKey>> configure)
        where TKey : notnull;

    /// <summary>
    /// Adds an LRU (Least Recently Used) cache to the dependency injection container.
    /// </summary>
    /// <typeparam name="TKey">Type of keys stored in the cache.</typeparam>
    /// <typeparam name="TValue">Type of values stored in the cache.</typeparam>
    /// <param name="services">Dependency injection container to add the cache to.</param>
    /// <param name="section">Configuration part that defines cache options.</param>
    /// <returns>The value of<paramref name="services"/>.</returns>
    /// <exception cref="ArgumentNullException">When passed <paramref name="services"/> or <paramref name="section"/> are <see langword="null"/>.</exception>
    public static IServiceCollection AddRCacheLru<TKey, TValue>(this IServiceCollection services, IConfigurationSection section)
        where TKey : notnull;

    /// <summary>
    /// Adds a named LRU (Least Recently Used) cache to the dependency injection container.
    /// </summary>
    /// <typeparam name="TKey">Type of keys stored in the cache.</typeparam>
    /// <typeparam name="TValue">Type of values stored in the cache.</typeparam>
    /// <param name="services">Dependency injection container to add the cache to.</param>
    /// <param name="name">Name of the cache, used in telemetry.</param>
    /// <param name="section">Configuration part that defines cache options.</param>
    /// <returns>The value of<paramref name="services"/>.</returns>
    /// <exception cref="ArgumentNullException">When passed <paramref name="services"/> or <paramref name="section"/> are <see langword="null"/>.</exception>
    /// <exception cref="ArgumentException">When passed <paramref name="name"/> is <see langword="null"/> or empty.</exception>
    public static IServiceCollection AddRCacheLru<TKey, TValue>(this IServiceCollection services, string name,
        IConfigurationSection section)
        where TKey : notnull;

    /// <summary>
    /// Adds a named LRU (Least Recently Used) cache to the dependency injection container.
    /// </summary>
    /// <typeparam name="TKey">Type of keys stored in the cache.</typeparam>
    /// <typeparam name="TValue">Type of values stored in the cache.</typeparam>
    /// <param name="services">Dependency injection container to add the cache to.</param>
    /// <param name="name">Name of the cache, used in telemetry.</param>
    /// <param name="configure">A function used to configure cache options.</param>
    /// <returns>The value of<paramref name="services"/>.</returns>
    /// <exception cref="ArgumentNullException">When passed <paramref name="services"/> or <paramref name="configure"/> are <see langword="null"/>.</exception>
    /// <exception cref="ArgumentException">When passed <paramref name="name"/> is <see langword="null"/> or empty.</exception>
    public static IServiceCollection AddRCacheLru<TKey, TValue>(this IServiceCollection services, string name,
        Action<RCacheLruOptions<TKey>> configure)
        where TKey : notnull;
}

/// <summary>
/// Metrics published by <see cref="RCache{TKey, TValue}"/>.
/// </summary>
public static class RCacheMetrics
{
    /// <summary>
    /// Name of the <see cref="Meter"/> to listen to is the name of the cache.
    /// </summary>
    /// <example>
    /// RCache with name "test" will publish metrics with tag "cache-name" equal to "test".
    /// </example>
    public const string LruCacheMeterName = "Microsoft.Extensions.Cache.Memory";

    /// <summary>
    /// Gets the total number of cache queries that were successful.
    /// </summary>
    /// <remarks>
    /// Metric is exposed as <see cref="ObservableCounter{T}"/> with value being <see cref="long"/>.
    /// </remarks>
    public const string Hits = "rcache.hits";

    /// <summary>
    /// Gets the total number of unsuccessful cache queries.
    /// </summary>
    /// <remarks>
    /// Metric is exposed as <see cref="ObservableCounter{T}"/> with value being <see cref="long"/>.
    /// </remarks>
    public const string Misses = "rcache.misses";

    /// <summary>
    /// Gets the total number of expired values.
    /// </summary>
    /// <remarks>
    /// Metric is exposed as <see cref="ObservableCounter{T}"/> with value being <see cref="long"/>.
    /// Expired values are one removed from cache because they were too old.
    /// </remarks>
    public const string Expirations = "rcache.expirations";

    /// <summary>
    /// Gets the total number of values added to cache.
    /// </summary>
    /// <remarks>
    /// This value refers to total calls to set method, and does not include updates.
    /// Metric is exposed as <see cref="ObservableCounter{T}"/> with value being <see cref="long"/>.
    /// </remarks>
    public const string Adds = "rcache.adds";

    /// <summary>
    /// Gets the total number of cache removals.
    /// </summary>
    /// <remarks>
    /// This value refers to total calls to remove method, and does not include evictions.
    /// If you are interested in metric of total values being removed from the cache, add <see cref="Evictions"/> and <see cref="Expirations"/>.
    /// Metric is exposed as <see cref="ObservableCounter{T}"/> with value being <see cref="long"/>.
    /// </remarks>
    public const string Removals = "rcache.removals";

    /// <summary>
    /// Gets the total number of evicted values.
    /// </summary>
    /// <remarks>
    /// Metric is exposed as <see cref="ObservableCounter{T}"/> with value being <see cref="long"/>.
    /// Evicted values are those removed based on the implementation's eviction policy and not time expiration or intentional removal.
    /// </remarks>
    public const string Evictions = "rcache.evictions";

    /// <summary>
    /// Gets the total number of updated values.
    /// </summary>
    /// <remarks>
    /// Metric is exposed as <see cref="ObservableCounter{T}"/> with value being <see cref="long"/>.
    /// </remarks>
    public const string Updates = "rcache.updates";

    /// <summary>
    /// Gets the total number of values in the cache over time.
    /// </summary>
    /// <remarks>
    /// Metric is exposed as <see cref="ObservableGauge{T}"/> with value being <see cref="long"/>.
    /// </remarks>
    public const string Count = "rcache.entries";

    /// <summary>
    /// Gets the gauge of elements compacted in the cache.
    /// </summary>
    /// <remarks>
    /// Metric is exposed as <see cref="Histogram{T}"/> with value being <see cref="long"/>.
    /// </remarks>
    public const string Compacted = "rcache.compacted_entries";
}

API Usage

Add default cache to DI.

IServiceCollection services;

services.AddRCacheLru<Guid, User>();

services.AddRCacheLru<Guid, User>("different", options => options.PublishMetrics = false);

Get default cache from DI.

namespace Example;

using Microsoft.Extensions.Cache;

public class UserService
{
    private readonly RCache<Guid, User> _users;

    public UserService(RCache<Guid, User> users)
    {                  ^^^^^^^^^^^^^^^^^^^^^^^^
        _users = users
    }
}

Create cache using builder.

namespace Example;

using System.Diagnostics.Metrics;
using Microsoft.Extensions.Cache;

public class UserService
{
    private readonly RCache<Guid, User> _users;

    public UserService(IMeterFactory meterFactory)
    {
        _users = new RCacheLruBuilder<Guid, User>("my-cache-name")
            .WithMeterFactory(meterFactory)
            .Build();
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    }
}

Alternative Designs

Callbacks and notifications We could introduce a feature from Microsoft.Extensions.Caching.Memory that allows to track if value was changed or implement eviction callbacks. Instead of storing a callback with each entry or some change tokens we could use System.Threading.Channels library. Through exposed ChannelReader, consumer could react on events like eviction, mutation and so on. I expect this design to be more memory/cpu efficient than existing one.

Size check We could implement item size limit through interface implemented on stored type by library client.

public interface ISizer
{
   int GetSizeInBytes();
}

This design would allow us to implement size check without boxing TValue or introducing CPU branches on hot-path.

Asynchronous interface

We wanted to be explicit that everything in RCache should be sync. This approach allows us not to go into distributed systems problems, since async caches requires much richer semantics. We are going to propose another interface in the future that covers asynchronous caches scenarios and more.

Risks

Another cache This is yet another memory cache implementation which may confuse .NET community. We should be clear in framing the purpose of this interface, and make sure design is extendable enough so we don't need to repeat the same process in the future.

Layering Is dotnet/extensions the right place for this code? For me it looks like a general purpose data structure that should be available for the whole stack. Should we introduce it to runtime repository?

Deferred removal Cache item removal is deferred which can keep alive objects for longer than they need to be.

Does it interoperate with TimeProvider?

No, it doesn't. The whole purpose of this library is to speed up things. We would need to introduce time provider in the hot-path, which adds a virtual dispatch overhead. We can add 'FakeLruRCache' which would be no different from original implementation beside using TimeProvider. That would allow to test algorithm alone.

update

Actually, we could add TimeProvider in options as well, and until it would be overriden manually we could ignore it. I think it would solve the perf concern.

What is the thread safety model here of the valueFactory? I typically wrap MemoryCache to ensure that the value factory is only called once in case of a cache miss. This always seemed self evident to me, since you wouldn't be caching in the first place if the valueFactory was cheap.

And if there is a locking mechanism, is it global or is it dependent on the key, like in ConcurrentDictionary?

What is the thread safety model here of the valueFactory?

We are not preventing stampedes atm, so there is a chance that factory is called more than once. Our model is basically the same what concurrent dictionary GetOrSet is doing. The solutions I see:

Lazy{T},
Using accepting TState GetOrSet overload and implement locking in there
Add one more overload for RCache like GetOrSetLazy (?) that encapsulates Lazy approache in the cache itself.
Extension method over RCache that is doing one of the mentioned above and comes from library.
Explicitly add different cache implementation like 'LruAtomicWrites'.

If there is something I am missing let me know.

And if there is a locking mechanism, is it global or is it dependent on the key, like in ConcurrentDictionary?

RCache is wrapping over concurrent dictionary and it has no global locks.

Nice. I like it. If the Lazy usecase for expensive valueFactories is nicely documented and tested (single execution guraranteed), then it does not need extra API support I believe.

What I do VERY often is Lazy<Task<TValue>>. So that concurrent requests await the same long running task. Please take this into account in examples, tests and documentation. 🙏

Please consider support for "keyed service dependency injection".
[FromKeyedServices("usersCache")] [FromKeyedServices("projectsCache")] Typically I would configure expiration times on the level of the keyed instance of cache for all items in the once cache. But maybe your idea about naming these and obtaining from factory is a better consideration.

The library was originally developed before key service support was released, so it comes with:

public interface IRCacheProvider
{
  public RCache<TKey, TValue> Get<TKey, TValue>(string name);
}

and getting it by name from DI:

namespace Example;

using Microsoft.Extensions.Cache;

public class UserService
{
    private readonly RCache<Guid, User> _users;

    public UserService(IRCacheProvider caches)
    {
        _users = caches.Get<Guid, User>("my-cache-name");
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    }
}

I didn't include it in the proposal because it will be changed towards keyd services.

@rafal-mz Is RCache in a repo somewhere to peek at? I've used and even customized Bitfaster in the past so curious to see what changes were made in this case.

RCache is not yet public. I will create a PR and link here as soon as it is possible.

Interesting proposal. Thanks for asking the community for feedback. I am sure lots of us value effort being put into the caching bits of the framework.

For me the Lazy ("cache stampede") factory evaluation requirement is pretty fundamental. Personally I struggle to find use cases that warrant a cache without lazy factory. Either not needing a lazy factory shows you don't actually need a cache at all (aka premature optimisation), or you need a cache such that it must have a lazy factory to not overwhelm your system. My experience says it should be lazy factory by default, and lazy factory is what people expect when using a cache for the first time. But I totally accept I'm biased as the author of a library that does just that! I would like to hear from use cases that need a cache without lazy factory behaviour.

And a couple of questions:

Can you explain the name RCache? What does it stand for? Doesn't seem intuitive to me.
I understand the need to add a new cache API for higher performance caching but why not just fix/expand the existing Microsoft libraries? I'm sure there is good reason but it needs setting out in writing because a framework with 3 official cache libraries is not a great experience either.
What is the difference between the proposal and the existing open source Bitfaster library? Could that be improved instead? Why not just support and direct people to existing existing open source community work if they need the highest performance cache?

Thanks for your feedback. I really appreciate it.

Either not needing a lazy factory shows you don't actually need a cache at all (aka premature optimisation).

I would not go that far. Cache can be used as general purpose data structure that keep memory constrained through eviction policy and capacity limit. I've seen (written few myself :-)) far too many concurrent dictionaries called 'cache' that were accumulating memory to an infinite. RCache is just a concurrent dictionary wrapper that adds efficient eviction policy, so I would not limit the use cases to 'costly to create'. Sometimes it may be 'costly to keep forever'. Generally, the cost of creation matter less the longer entry TTL.

So far I've had an opinion that fast and simple is better than safe and slow. Now I am not sure. Need to think it through.

Can you explain the name RCache? What does it stand for? Doesn't seem intuitive to me.

We just had to have a different name than existing MemoryCache. RCache was developed in a project that has a name starting with 'R'.

I understand the need to add a new cache API for higher performance caching but why not just fix/expand the existing Microsoft libraries?

.NET team has long lasting culture of keeping strict backward compatibility guarantees. That's why expanding is hard when original implementations are not abstract classes. Microsoft.Extensions.Caching.Memory is getting better with every version but if the interface is locked it is only so much you can do.

Hopefully the interface we propose now can be extendable enough that in the future we can still improve the library and not introduce yet another API.

I'm sure there is good reason but it needs setting out in writing because a framework with 3 official cache libraries is not a great experience either.

We are aware of this problem that's why we call it out in risks. As mentioned above we cannot remove we can just add.

What is the difference between the proposal and the existing open source Bitfaster library? Could that be improved instead? Why not just support and direct people to existing existing open source community work if they need the highest performance cache?

RCache so far implements one algorithm out of many that are available in BitFaster. Some technical details in RCache are different compared to original. Author of BitFaster is a Microsoft employee and he was involved in the process.

Its worth to mention that there is also a layering problem. For instance, for a library to be used in runtime repo, it needs to be pulled there. This is also something we call out in risks.

Interesting proposal. Thanks for asking the community for feedback. I am sure lots of us value effort being put into the caching bits of the framework.

For me the Lazy ("cache stampede") factory evaluation requirement is pretty fundamental. Personally I struggle to find use cases that warrant a cache without lazy factory. Either not needing a lazy factory shows you don't actually need a cache at all (aka premature optimisation), or you need a cache such that it must have a lazy factory to not overwhelm your system. My experience says it should be lazy factory by default, and lazy factory is what people expect when using a cache for the first time. But I totally accept I'm biased as the author of a library that does just that! I would like to hear from use cases that need a cache without lazy factory behaviour.

This does bring up the point that, If there is a good way to add some additional abstractions to make Lazy evaluation simple and efficient, that would be great.

At the same time, I know of at least two larger OSS libs that may be able to benefit from a more raw base abstraction. (in one case I saw equivalent performance with lower memory usage compared to a very nice, hand crafted robin hood hash)

And a couple of questions:

* What is the difference between the proposal and the existing open source Bitfaster library? Could that be improved instead? Why not just support and direct people to existing existing open source community work if they need the highest performance cache?

If I had to guess, From my own experience hacking at it, Bitfaster is a very good caching library, however if you need a portion of the functionality it is better to specialize the implementation you can get even more from it. Kinda per above, on the average it's really good, but a purpose trimmed and tuned 'general but fixed use case' version is competitive with a 'hand chosen and written algo' cache.

From my own experience hacking at it, Bitfaster is a very good caching library, however if you need a portion of the functionality it is better to specialize the implementation you can get even more from it.

I am not familiar with it but was contributing to "Bitfaster" considered as an alternative? (Can we cc their owners here?)

@alastairtree I believe the value of a Lazy model is for a small cache with a high likelihood of concurrent accesses to the same state. But in a high traffic service which may have millions of entries in cache, the risk of wasted work due to concurrent evaluation of the factory method is low enough to simply not be a concern.

However, if it is a concern, then the cache can simply be configured to hold Lazy and then it will work as needed and prevent redundant calls to the factory method. I think that seems like a good place to be.

Now even if your cache sometimes experiences collisions and does redundant work as a result, it doesn't mean use of Lazy is warranted. It depends on the observed rate of collisions. Using a Lazy imparts a definite measurable cost to every use of the cache, whereas collisions only cost you when they occur. Thus a low collision rate may be cheaper than using Lazy.

I am excited about the introduction of a new memory cache and will follow its final specification if it proceeds. My experience with caching is not detailed enough to properly take part in the current level of conversation but I am happy to provide input and feedback as the design and first iteration progresses. Picking Bitfaster and extending/improving it sounds like the right plan. The current proposal appears to nearly match the existing Bitfaster or am I missing some key items?

@RobDeVoer It's very similar on the inside, but the API surface has been considerably simplified by hiding all the complications around how monomorphisation is achieved. And we have some ideas in the pipeline which may deliver additional raw performance to the code as well, but those are still experimental in nature.

Hi all, here's my thoughts on this effort: I tried to condense all of them into a single answer, therefore sorry for it being quite long.

First of all, as others already said, thanks for discussing this in public to gather the community's opinions, including the ones by OSS maintainers like me, @alastairtree , @Turnerj and others. Really appreciated!

The most important thing I'd like to highlight is that when this will be finalized and released (I suppose in the .NET 9 timeframe), it will be important to clearly explain the positioning of it. As already discussed on the previous issue caching is a tough beast with quite different use cases, the 2 primary ones being:

low-level caching: basically a dictionary + some form of expiration, where the data is typically homogeneous (of the same type), typically privately owned by a single piece of code (eg: internally by libraries like EF), where the data is not shared app-wide, the type of the cache key should be anything (so no boxing, etc) and with a limited feature set. Also typically there will be a lot of different instances of it
app-level caching: is still caching, but more "high level", where the data may be heterogeneous (of different types), the data is (or may be) shared between different pieces of code, the type of the key is typically string to allow for some higher level features like prefix/suffix addition for automatic scoping/less collisions and with a generally richer feature set, with typically few instances of it

Following this distinction I see this new RCache as part of the 1st category, and in this sense it feels good to me: the scope is quite narrow, no extra high-level features and the attention to micro-benchmarking is quite high. FWIW, good 👍

RCache

As pointed out by others the name is quite strange: I feel you in wanting to avoid some other uber general/all encompassing name and to avoid confusion with the other 2 existing _memory cache_s, so that this also feels "smaller/specialized/optimized" (again, 1st category from above). Having said that, naming it RCache just because the project where it originated started with an "R"... I don't know. Maybe something that communicate the low-level/no-frills nature of it like RawCache or similar? Mind you, it's probably a minor thing, but still wanted to point that out.

Both caches implementations are using background threads / timers. Our implementation maintains its state on set method, thus sparing some threadpool overhead over existing libraries. Our implementation is also friendlier to use, since it won't create a memory leak when not disposed (compared to System.Runtime.Caching which allocates timers on ctor).

Understood, but to do this it seems you are not cleaning up automatically: not having the code to look at I'm wondering if a SET call of key "foo" will also cleanup the expired entry for cache key "bar". If this is not the case you may run into memory remaining allocated by cache entries that ended up not being touched for a long time, so this is something to look out for. A different approach may be to include timed background cleanup but as optional, with something like a bool flag in the general options or a TimeSpan? (null being the default, so no auto-cleanup). This way by default no leaks will be guaranteed but, if wanted, the ease of it can be obtained. I think the related code should not be that much.

** RCache is disposing entries on eviction and it can be used as eviction callback.

The feature is nice, but watch out for this (been there): I would enable automatic Dispose calls on eviction only as an opt-in feature (like bool EnableAutoDisposeOnEvict), since sometimes I saw usages in the wild where something is get from the cache and kept used only to become randomly disposed while still being used, for this exact behavior.

API Proposal ...

In general it looks quite good to me: I like that it is an abstract class with different factory methods so that currently there's only the LRU special case but I can imagine in the future there may be others (LFU, TLRU, PLRU, MRU, RR, FIFO, etc).

A couple of notes though:

factory methods: why 2 of them instead of only one with name + options and options being nullable? If tomorrow a new param param3 would arrive, that may generate new permutations (name + param3, name + options + param3, etc). Nothing major, just wondering
named caches: thinking about the DI support, I assume calling CreateLru<TKey, TValue>("foo") multiple times will return the same instance right? What about calling it twice with the same name but different options? Or with the same name but with different types? Is the name unique per-types-combination or global?
options: RCacheLruOptions<TKey, TValue> why the generic typing here? Currently there does not seem to be nothing justying that, and as it is it would avoid sharing the same options with different instances. Maybe is it to make it future proof? Admittedly the cost here is nothing, so again just wondering
ExtendedTimeToEvictAfterHit vs ExtendTimeToEvictAfterHit: I suggest to use a Is/Enable/etc prefix for bool options. In this case specifically the 2 options looks quite identical, so by using something like EnableExtendedTimeToEvictAfterHit it would make it clearer which one is which
methods overloads: I see different overloads with/without optional params (like 4 different GetOrSets). I understand the need but down the line it may be quite daunting to maintain/evolve with this approach
factory signature: you are using the specific params currently used, I can imagine to avoid bloat. While I feel that, I also can see a problem in the future when new factory params/return values may be needed, and then you would need to add more overloads and handle them all and the permutations will become quite a lot (been there...)
the bool TryGet(out ...) signature will not be usable in case one day you'd like to add async support, so you'll eventually end up with 2 different methods with 2 very different signatures for the sync and async versions. I'd suggest to start immediately with a signature that will be able to support async. In my project I have used a simple MaybeValue<T> struct, kinda like a Nullable<T> but for both ref/value type (like a maybe/option monad). It works well, is not heavy on resources and is easily evolvable into the async territory

Another thing is that, as already said by others, the lack of cache stampede prevention is quite a big thing. I can feel not wanting to go there, but at the same time it's something quite important and expected, and something that is already missing in the current MemoryCache (with the added "bonus" there that people usually think the GetOrCreate ext method will protect them, when instead it will not).

Because of this, if you decide not to implement it, I think it should be very clear to consumers that there's no protection there.

Also, now that I'm thinking about it: if no stampede prevention is provided, and no async support is provided (see later), why having a GetOrSet at all?

By not having it:

it would make it clear that there's no stampede prevention
it would avoid having to support its different overloads (including the TState support)

I mean, I understand it's an attractive method to have, but it may be counterproductive.

Alternative Designs

Callbacks and notifications [...] Instead of storing a callback with each entry or some change tokens we could use System.Threading.Channels [...] I expect this design to be more memory/cpu efficient than existing one

Oh yes please: if you decide to go with the callbacks route, a single stream of events would definitely be better then a ton of specific callbacks stored inside each cache entry.

Size check We could implement item size limit through interface implemented on stored type by library client.
public interface ISizer
{
   int GetSizeInBytes();
}
This design would allow us to implement size check without boxing TValue or introducing CPU branches on hot-path.

This may be a nice idea: in case it will go forward I think it should be explained more and experimented with, including some practical examples on how to implement it.

Asynchronous interface

We wanted to be explicit that everything in RCache should be sync.

Understood, but as said by providing a GetOrSet method with a factory passed to it, I think users will expect both of these:

stampede is take care of
both sync/async are supported

Anyway fingers crossed for you: I hope it will be enough for the community... but I'm not that sure.

This approach allows us not to go into distributed systems problems, since async caches requires much richer semantics.

Eh, definitely, I feel that 😅

We are happy though to add 'TryAdd' to our proposal, to avoid stampedes for async factories.

How would a TryAdd method solve the stampede for async operations? If there's stampede prevention, it should be in the normal GetOrSet method(s) available, maybe optionally (like the new output cache in aspnet).

We are going to propose another interface in the future that covers asynchronous caches scenarios and more.

I'm interested in this, will follow!

Enumeration We decided not to implement enumeration on RCache:

Good call!

A couple of final things.

This looks good for caches of homogeneous values, but for heterogeneous values? Would we just use RCache<TKey, object> and then cast the result or is there something else to look out for?

Capacity: what happens when the Capacity is reached? I think I've missed the explanation.

Regarding logging: is there a plan to support logging or is it way too overkill at such a low level? I can imagine the latter, but just asking.

Ok, I think it's all.

Thanks again for sharing this with us!

Thanks for your feedback. :-)

when this will be finalized and released (I suppose in the .NET 9 timeframe)

If this is going to end up in dotnet/extensions then we have 1 month release cadence for this repository.

Following this distinction I see this new RCache as part of the 1st category, and in this sense it feels good to me: the scope is quite narrow, no extra high-level features and the attention to micro-benchmarking is quite high. FWIW, good 👍

I agree with this classification and how you position RCache there. Application level cache IMHO must have much richer semantics to support all those features and it affects overall performance. Lower level cache can be also used as a building block for creating app level caches.

Maybe something that communicate the low-level/no-frills nature of it like RawCache or similar?

I am not very opinionated on the name. I like RCache since it is short and quite unique. If it will lay in appropriate namespace and will be well documented I don't see an object name to be point of confusion. Probably whatever name we choose there will be somebody that don't like it, so I will leave this decision to you.

Understood, but to do this it seems you are not cleaning up automatically: not having the code to look at I'm wondering if a SET call of key "foo" will also cleanup the expired entry for cache key "bar". If this is not the case you may run into memory remaining allocated by cache entries that ended up not being touched for a long time, so this is something to look out for. A different approach may be to include timed background cleanup but as optional, with something like a bool flag in the general options or a TimeSpan? (null being the default, so no auto-cleanup). This way by default no leaks will be guaranteed but, if wanted, the ease of it can be obtained. I think the related code should not be that much.

You are correct with the first three sentences. We may leave uncleared memory when cache is long untouched. This is the reason our interface has RemoveExpiredEntries method, so that if you know this might be your case you can implement automatic clearup easily. We can also create some helpers that do it for you. LRU implementation that we have now is not using background threads but the LFU implementation that we will come up with next will do. One of the core ideas of this proposal is - this is not just one implementation. This is the first one.

The feature is nice, but watch out for this (been there): I would enable automatic Dispose calls on eviction only as an opt-in feature (like bool EnableAutoDisposeOnEvict), since sometimes I saw usages in the wild where something is get from the cache and kept used only to become randomly disposed while still being used, for this exact behavior.

Thats great experience. I will add this to proposal. Thanks a lot!

factory methods: why 2 of them instead of only one with name + options and options being nullable? If tomorrow a new param param3 would arrive, that may generate new permutations (name + param3, name + options + param3, etc). Nothing major, just wondering

I expect 'Options' parameter to absorb new features instead of adding new permutations to factory method. This is just a preference decision, since nullable is semantically equivalent to what we have now. I think that interfaces that require passing null explicitly are just longer to write and not always clear to interpret.

named caches: thinking about the DI support, I assume calling CreateLru<TKey, TValue>("foo") multiple times will return the same instance right? What about calling it twice with the same name but different options? Or with the same name but with different types? Is the name unique per-types-combination or global?

RCache factory is a factory, so creating the cache with the same name twice will return different instance. (Not using DI)
Using `AddLruRCache' in ServiceCollection with the same name multiple times will configure one instance of RCache that is unique in DI per name and types. We are following .NET options pattern, so each configuration will apply in sequnce.
As stated above, name + types is a differentiatior. Cache with the same name and different types will have its own instance.

So to summary, DI is global per name and key value combination. Creating 'RCache' by hand gives you no protection.

options: RCacheLruOptions<TKey, TValue> why the generic typing here? Currently there does not seem to be nothing justying that, and as it is it would avoid sharing the same options with different instances. Maybe is it to make it future proof? Admittedly the cost here is nothing, so again just wondering

By having RCacheLruOptions follow the same generic parameters as RCache itself, you can have shorter signature for creating a cache using factory since types will be inferred.
This is future proof if we want any TValue specific option in options we don't need to do a breaking change. (Examples: calculating TValue size func in options)

ExtendedTimeToEvictAfterHit vs ExtendTimeToEvictAfterHit: I suggest to use a Is/Enable/etc prefix for bool options. In this case specifically the 2 options looks quite identical, so by using something like EnableExtendedTimeToEvictAfterHit it would make it clearer which one is which

Noted, will think about clearer naming.

methods overloads: I see different overloads with/without optional params (like 4 different GetOrSets). I understand the need but down the line it may be quite daunting to maintain/evolve with this approach

Could you give an example when maintaing/evolving it is daunting?

factory signature: you are using the specific params currently used, I can imagine to avoid bloat. While I feel that, I also can see a problem in the future when new factory params/return values may be needed, and then you would need to add more overloads and handle them all and the permutations will become quite a lot (been there...)

If I understand correctly you think that the RCache factory may be returning some wrap over TValue and TTL and accept different parameters than TKey and TState?

I think that having TState allow you to pass any parameter type, and returning TValue to return any type of value. Could you share your experience with the issues you had with permutations?

the bool TryGet(out ...) signature will not be usable in case one day you'd like to add async support, so you'll eventually end up with 2 different methods with 2 very different signatures for the sync and async versions. I'd suggest to start immediately with a signature that will be able to support async. In my project I have used a simple MaybeValue struct, kinda like a Nullable but for both ref/value type (like a maybe/option monad). It works well, is not heavy on resources and is easily evolvable into the async territory

RCache will never have async support. This is explicit design decision and proposal is not supposed to evolve in this direction.

Also, now that I'm thinking about it: if no stampede prevention is provided, and no async support is provided (see later), why having a GetOrSet at all?

As explained above by Martin, stampade can be easily fixed by returning Lazy from a cache. For big services collision rate is small, so even when cost of creation is high you don't need this protection.

Understood, but as said by providing a GetOrSet method with a factory passed to it, I think users will expect both of these:

Concurrent dictionary does not support async and does not prevent stampedes but still supports GetOrSet. Why users would expect different behavior from RCache than ConcurrentDictionary?

How would a TryAdd method solve the stampede for async operations?

I've made a mistake, it won't. I've removed this part from proposal.

This looks good for caches of homogeneous values, but for heterogeneous values? Would we just use RCache<TKey, object> and then cast the result or is there something else to look out for?

This is one possible solution.

Capacity: what happens when the Capacity is reached? I think I've missed the explanation.

When capacity reaches nothing happens. When an item is added to a cache with full capacity, cache will evict one entry in approx Lru order.

Regarding logging: is there a plan to support logging or is it way too overkill at such a low level? I can imagine the latter, but just asking.

I think it would be an overkill but we can add optional logging for diagnostic purposes that would be turned off by default. The question is what the log lines would be about? If the case is to log TKey and TValue there comes a problem of not capturing any sensitive data.

Small remark. I realize that we aim primarily for the highly perf, low allocation scenarios (like ConcurrentDictionary). That is a good call. I might use it for many of my ConcurrentDictionaries.

I am a big fan of the TValue GetOrSet<TState>(TKey key, TState state, Func<TKey, TState, TValue> factory); overload, which allows to save allocation of TState closure for the factory that might need it.

I just hope that it will not be implemented as is in the BitFaster caching repo like this V GetOrAdd<TArg>(K key, Func<K, TArg, V> valueFactory, TArg factoryArgument) => this.GetOrAdd(key, k => valueFactory(k, factoryArgument));

Where this line indeed creates a closure of the valueFactory and factoryArgument. Sorry if that is obvious, just wanted to make sure.

No, its not implemented like that. This overload was specifically added to avoid closure allocation.

If this is going to end up in dotnet/extensions then we have 1 month release cadence for this repository.

Ah, probably even better.

I agree with this classification and how you position RCache there. Application level cache IMHO must have much richer semantics to support all those features and it affects overall performance. Lower level cache can be also used as a building block for creating app level caches.

Exactly, that's the idea!

Probably whatever name we choose there will be somebody that don't like it

Yep, most definitely 🤷

so I will leave this decision to you.

Ahah thanks, but part I'm not a Microsoft employee :-)

Anyway, another idea that just came up: MemoryCacheSlim. It's a known and common naming pattern in .NET and already have ManualResetEvent and ManualResetEventSlim, Semaphore and SemaphoreSlim, etc and each "slim" variant is kinda known to be the "lightweight version with somewhat less features" of the "full" version. On the other hand maybe in this case the 2 designs (MemoryCache and this new one) would be quite different so this convention may not apply. I don't know, just spitballing here at the moment for everybody to think about it.

You are correct with the first three sentences. We may leave uncleared memory when cache is long untouched. This is the reason our interface has RemoveExpiredEntries method, so that if you know this might be your case you can implement automatic clearup easily. We can also create some helpers that do it for you. LRU implementation that we have now is not using background threads but the LFU implementation that we will come up with next will do.

Understood, and I like having the ability to not having stuff running in the background and being able to fully control manual calls to RemoveExpiredEntries manually. Having said that, I still think the support to do it automatically in an opt-in way with the implementation already provided would be something people would like, and the code may be small in theory. But of course I may be wrong here, and without looking at the code I'll wait and see.

Btw I later noticed that the abstract class already implements IDisposable, so I was wondering the rationale here: without a timer there would not be a potential timer leak, got it, but what is IDisposable here for then? There mey still be leaks of something else?

One of the core ideas of this proposal is - this is not just one implementation. This is the first one.

Totally got it, and I really like it!

Thats great experience. I will add this to proposal. Thanks a lot!

You're welcome!

I expect 'Options' parameter to absorb new features instead of adding new permutations to factory method.

Makes sense. I still have a doubt about what can actually go into options, when we also consider the IOptions<T> pattern and the way it can naturally be configured via the general config system (so JSON deserialization and stuff), and I have the same concern in FusionCache, too. Some time ago I was wondering about this with both Davids (Fowler and Pine) in an email thread: should sub-components like an ITimeProvider or similar (so not just plain values) be put directly into an options object or on separate ctor params, but I never got to a definitive answer about it. Do you see an hypothetical ITimeProvider or others non-plain-values into the options class itself?

This is just a preference decision, since nullable is semantically equivalent to what we have now. I think that interfaces that require passing null explicitly are just longer to write and not always clear to interpret.

Agree, but I usually also declare the nullable params with a default value of null, to get the best of both worlds. Anyway it's more of a stylistic preference probably, so I get the different vibes.

RCache factory is a factory, so creating the cache with the same name twice will return different instance. (Not using DI)

Using `AddLruRCache' in ServiceCollection with the same name multiple times will configure one instance of RCache that is unique in DI per name and types. We are following .NET options pattern, so each configuration will apply in sequnce.

As stated above, name + types is a differentiatior. Cache with the same name and different types will have its own instance.

Yeah sorry, after posting my comment I got to it, sorry for the time wasted.

So to summary, DI is global per name and key value combination. Creating 'RCache' by hand gives you no protection.

Makes sense, thanks.

This is future proof if we want any TValue specific option in options we don't need to do a breaking change. (Examples: calculating TValue size func in options)

Right, makes sense.

Could you give an example when maintaing/evolving it is daunting?

Eh, (un)luckily yes 😅: see here, here and here. They are the same class (splitted in 3 partial files for organization purposes), and they represent the permutation of all the optional params and signatures. I started with some, and then added support for more params, so I had to add all the possible combinations of params to keep the thing aligned with the general approach. Maybe this will not apply here, and maybe it's not so bad (I had to do it once), but I preferred to warn you guys about this, maybe you can take some inspiration on what to potentially avoid.

If I understand correctly you think that the RCache factory may be returning some wrap over TValue and TTL and accept different parameters than TKey and TState? I think that having TState allow you to pass any parameter type, and returning TValue to return any type of value. Could you share your experience with the issues you had with permutations?

For the return types of the factory lambda, adding new overloads with new lambda signatures is a pita, whereas adding a new prop to an existing return type is much easier and will not explode into a lot of variations. On the other hand I can understand the potential extra allocations even when not needed, but still is something to look out for.

As an exercise I suggest to try to keep the current design, and see what would needs to change in the future by adding an extra return type, not just TValue or (TValue, TimeSpan), but also the Hotness thing you mentioned, let's say to let consumers configure how important something is (it's just an example). How would that work down the line?

RCache will never have async support. This is explicit design decision and proposal is not supposed to evolve in this direction.

Ok, I was referring to this passage "We are going to propose another interface in the future that covers asynchronous caches scenarios and more" and thinking about consistency overall.

As explained above by Martin, stampade can be easily fixed by returning Lazy from a cache.

Yes, as in ConcurrentDictionary, but then on the consumer side I saw a lot of people wanting to avoid always having to do dict[key].Value etc and ending up creating their own LazyConcurrentDictioanry or whatever to shield them from having to directly work with Lazy<T> every time. Also, they need to understand the LazyThreadSafetyMode. Just my two cents.

For big services collision rate is small, so even when cost of creation is high you don't need this protection. Mmmh, this sounds strange: I had members of the FusionCache community telling me that just by swapping MemoryCache with FusionCache they saw somewhere around 50% less calls to the database. Of course this wholly depends on the access patterns involved, so it may very well depends on a lot of factors. And anyway you surely have way more field data than me 😅 so I trust you guys here.

Concurrent dictionary does not support async and does not prevent stampedes but still supports GetOrSet. Why users would expect different behavior from RCache than ConcurrentDictionary?

I'm going out on a limb here maybe, but 2 things come to mind: 1) it's not that they expect have different expectations, it's that in my experience most of the times they expect protection in the dictionary too, and are biten by it down the line when they discover that they are not. Of course there's the whole RTFM and stuff, but I'm just pointing this out after having seen it multiple times 2) the thing is called cache stampede, and this is a cache, so expectations may differ

This is one possible solution.

Got it, thanks!

When capacity reaches nothing happens. When an item is added to a cache with full capacity, cache will evict one entry in approx Lru order.

Got it.

I think it would be an overkill but we can add optional logging for diagnostic purposes that would be turned off by default.

As it is today, logging is probably overkill, I agree. I was just thinking in perspective. Anyway if it may be useful to you and the team, have a read here as I've been there in the past, maybe it can be of help with public expectations. In the end the standard way of having to configure the min log level (via namespace prefix etc) ended up being the right call, imho, but I'm still wondering if a global (meaning per-instance) bool flag would be a nice addition, to avoid inadvertently enabling logging for the cache, and for every cache instance. Since here we are driving for max perf, a nice little bool flag (default to false) to explicitly enable it may be a nice addition, on top of receiving an ILogger instance whichwould happen automatically via DI.

If the case is to log TKey and TValue there comes a problem of not capturing any sensitive data.

Good call, haven't thought about it.

The question is what the log lines would be about?

In FusionCache there can be a lot of stuff going on: async calls, automatic background factory completion, distribuetd cache + backplane communications, eager refresh and so on so logging becomes quite fundamental for when my users go into detective mode (and for my own sanity too, when developing new features 🤣). But here, in such a low-level + hi-perf case, probably it's not worth it, you're right.

Anyway... woah, what a good discussion to have, thanks again, I really appreciate it!

Ahah thanks, but part I'm not a Microsoft employee :-)

I've meant to leave this decision to the community, reviewers. For me its just a name.

Btw I later noticed that the abstract class already implements IDisposable, so I was wondering the rationale here: without a timer there would not be a potential timer leak, got it, but what is IDisposable here for then? There mey still be leaks of something else?

If the cache holds TValues implementing IDisposable, it will dispose each of those at its own disposal. The class is also holding System.Diagnostics.Metrics.Meter, which is IDisposable.

As an exercise I suggest to try to keep the current design, and see what would needs to change in the future by adding an extra return type, not just TValue or (TValue, TimeSpan), but also the Hotness thing you mentioned, let's say to let consumers configure how important something is (it's just an example). How would that work down the line?

The presented model is general and cannot have any assumptions about underlying algorithm. If we would return 'hotness' or 'priority' it would mean that all of the implementations must have a notion of prioritization. IMHO to specify such details in a correct way one needs to have deep understanding of the implementation details, algorithm used, data access patterns and excesssive measurements. So, I would call it a leaky abstraction.

There are some algorithm details which may be useful for tuning the cache for high performance. Such details (if exposed) will be part of the options object for specific implementation. In this case it would be RCacheLruOptions<TK, TV>

Do you see an hypothetical ITimeProvider or others non-plain-values into the options class itself?

I believe that anything like ILogger, TimeProvider should go into ctor and be taken from DI. First, it allows to configure cross cutting concerns once and the whole system 'pick them up' automatically. Second, it makes the contract implicit, so that it is less work for library developer to maintain the interface. Third (this is just my impression not backed by data), I think that configuration model is not that well understood in the community and folks often build service provider multiple times to pass some class instance to options.

Funny enough, I think in case of time provider and RCache it will be taken from options as instance. The reason for that is we have public factory that don't know about DI container.

I saw a lot of people wanting to avoid always having to do dict[key].Value etc and ending up creating their own LazyConcurrentDictioanry or whatever to shield them from having to directly work with Lazy every time. Also, they need to understand the LazyThreadSafetyMode.

I think (again as Martin already mentioned) we want people to pay for what they use. I don't like the idea of slowing down everybody by default to shield less experienced folks from shooting their foot. I would love to do both at the same time. :-)

the thing is called cache stampede, and this is a cache, so expectations may differ

Thats fair. I will think about it.

Given that this is a being pitched as a low level collection, and builds upon the concurrent dictionary, had any thought been put into placing it in System.Collections.Concurrent and calling it ConcurrentExpiringDictionary?

I know this loses the cache in the name name but maybe that is no bad thing because it differentiates with the existing caches and makes clear this is primarily designed to be a low level collection and not an application level cache.

Also it has the bonus that it's a more descriptive name that fits in more closely with the rest of the naming style in the framework.

I am the original author of ConcurrentLru on which RCache is based – so I can give some context on the design goals of the underlying algorithm and its performance characteristics. It’s also worth noting that ConcurrentLru is already used in many Microsoft production services serving tens of billions of requests per day - it’s battle tested and was born out of a practical need to replace MemoryCache to improve performance and stability.

The high-level design goals were along the lines of:

Optimize for throughput, in particular the cache hit ‘hot path’. In the services I work on, it is common to get very high frequency bursts of traffic accessing the same key. Making this lock free and fast under load gives us outsized gains.
Optimize hit rate, mitigate scanning. It’s common for downstream services to send us requests to check if there is work to do. Such services tend to for-loop over all the instances in the world waking them up and loading caches. This leads to ‘one hit’ items evicting hot items from MemoryCache, reducing hit rate and/or triggering memory exhaustion.
Mitigate the ‘memory thrashing’ failure mode. MemoryCache has a tendency to evict items that are in use causing them to be loaded again in a cycle. If the items are large/complex, this can quickly cause memory pressure/excessive GC. Once the CPU is pegged and there is a high influx of new items, the eviction thread may not run promptly causing a cascading failure as the machine runs out of memory and dies. I have seen multiple variants of this in production systems.
Pay for play. MemoryCache imposes memory and latency costs for all features even when they are not used. In a large scale service memory costs money and reducing the per item overhead makes a difference for large caches. We aggressively shrink everything because smaller heaps lead to faster GC and lower COGs.
Generic keys. It’s very common for us to use Guids or pairs of Guids to identify things, and boxing allocations on lookups should be avoided.

Not an exhaustive list – just to give an idea the focus was on having a strict bounded size cache that keeps the most useful items, suitable for use in the hot path of a high scale web application.

The bounded size part is important, cache management is amortized within cache writes by performing constant time operations without locks, and unlike MemoryCache it will not exceed capacity under load and doesn’t require a background thread. Instead, foreground threads cooperate to keep the cache within bounds using a variant of the 2Q algorithm (similar to that used in Memcached and OpenBsd). Implementing this algorithm efficiently without a background maintenance thread is to my knowledge novel.

A great deal of effort and tuning has been put into maximizing throughput and hit rate, and the approach is well suited to modern web servers. There are tradeoffs however, and there are a couple of caveats worth mentioning:

Since the algorithm is based on 3 concurrent queues, on machines with higher CPU count the queues can become a bottleneck when many threads contend on the head/tail of each queue. This will manifest as high CPU consumption due to spin waits inside ConcurrentQueue.Enqueue/Dequeue. That might sound scary, but in practice this can only really occur when there are constant very high frequency cache misses, and the cache factory does almost no computation. Cache factories do computational work even when throwing an exception and consequently it is hard to provoke this in the wild. For desktop/container use, this is a very unlikely failure mode. For a monolithic web app that runs on a 64 CPU VM more care is needed.
When used with time-based expiry, the cache will tend to hold onto items. If there are no writes, nothing will be evicted. This can be mitigated by periodic expiry on a background thread. On demand expiry is an O(n) operation since we depend on FIFO queues. The queues are optimized for concurrency, so it’s a tradeoff where we chose to favor throughput with strict bounded size.
It is not a pure LRU, and items are not evicted in LRU order. The cache can be in either a cold or warm state depending on queue fullness and will exhibit different patterns of eviction in each state. This is not intuitive and can cause confusion/alarm when replaying simple sequences of lookups and observing the results. The relative sizes of the internal queues have been carefully tuned to optimize hit rate from real world production traffic.

Small remark. I realize that we aim primarily for the highly perf, low allocation scenarios (like ConcurrentDictionary). That is a good call. I might use it for many of my ConcurrentDictionaries.

I am a big fan of the TValue GetOrSet<TState>(TKey key, TState state, Func<TKey, TState, TValue> factory); overload, which allows to save allocation of TState closure for the factory that might need it.

I just hope that it will not be implemented as is in the BitFaster caching repo like this V GetOrAdd<TArg>(K key, Func<K, TArg, V> valueFactory, TArg factoryArgument) => this.GetOrAdd(key, k => valueFactory(k, factoryArgument));

Where this line indeed creates a closure of the valueFactory and factoryArgument. Sorry if that is obvious, just wanted to make sure.

@Richard004 Not related to this discussion but worth clarifying: the quoted code above is the default implementation of the interface method that exists for backwards compatibility (in other words if you had implemented ICache yourself from an earlier version of my library, you will not have this new method and use this default as a fallback). The actual implementation on ConcurrentLru of course does not allocate a closure.

From my own experience hacking at it, Bitfaster is a very good caching library, however if you need a portion of the functionality it is better to specialize the implementation you can get even more from it.

I am not familiar with it but was contributing to "Bitfaster" considered as an alternative? (Can we cc their owners here?)

@danmoseley I of course would welcome any contributions and would like to see my library help folks to build better software. Related to the point above, I work hard to preserve backwards compatibility because I work on services spread across many repos and breaking changes are painful to absorb.

So I think having a solid option built into the framework is valuable. Anyone who had to update Newtonsoft.Json in a complex multi-repo service will understand why a solid framework option benefits the ecosystem and can be a huge time saver.

The only thing this is missing for it to be a drop-in replacement for the LRU in Orleans is a bool Remove(KeyValuePair<K, V>) API, which ConcurrentDictionary<K,V> implements & BitFaster.Caching uses internally in several spots. We use this to remove known-invalid entries under concurrency without clobbering the entry if it's been updated.

Why public abstract bool Remove(TKey key); API isn't sufficient?

Sometimes you need to be sure you're removing the key only when the value matches what you already have, either from a previous lookup or cached somewhere. This is important when multiple of these replacing operations might concurrently run.

Why public abstract bool Remove(TKey key); API isn't sufficient?

Because of the last part: "under concurrency without clobbering the entry if it's been updated." If an entry has already been replaced by a valid entry, we don't want the Remove operation to evict it since that would require us to recreate and populate the entry again. Recreating the entry is a remote call which is performed outside of the cache itself, serialized via a worker so that all calls for a given entry go to the same worker to prevent a flurry of duplicate work for a given hot key.

That's fair. I will add it.

@NinoFloris I gotta say, that use pattern never crossed my mind. But it's a totally reasonable pattern.

Will it compare the value by calling Equals? If so, will it hold any locks during that call?

Overall, I think this design will be useful and I understand it in light of the popularizing of microservices. But I actually am looking for an app-level caching solution.

Alternative Designs Callbacks and notifications We could introduce a feature from Microsoft.Extensions.Caching.Memory that allows to track if value was changed or implement eviction callbacks. Instead of storing a callback with each entry or some change tokens we could use System.Threading.Channels library. Through exposed ChannelReader, consumer could react on events like eviction, mutation and so on. I expect this design to be more memory/cpu efficient than existing one.

Size check We could implement item size limit through interface implemented on stored type by library client.
public interface ISizer
{
int GetSizeInBytes();
}
This design would allow us to implement size check without boxing TValue or introducing CPU branches on hot-path.

Glad these are being considered, both for the reasons expressed and for the ability to extend with custom cache management functionality. Unfortunately, IMO they are being applied to something that is even more low-level cache vs. app-level cache than the existing MemoryCache it is designed to improve. They would help me add features I need like app-level cache dependency's and memory management. I do wish Microsoft would provide some guidance on how to approach replacing System.Web.Caching behavior we have become so accustomed of.

Overall, I think this design will be useful and I understand it in light of the popularizing of microservices. But I actually am looking for an app-level caching solution.

Looking at the OSS space there are some app-level caching libs available, each with their own design and feature sets, have you tried them?

@rafal-mz I kept thinking about this, and 2 things came to mind which I'd like to share.

1) Hide Concrete Type

The first is if there's a reason to "hide" the concrete class returned by the static factory methods? I mean this:

public static class RCache
{
    public static RCache<TKey, TValue> CreateLru<TKey, TValue>(string name) where TKey : notnull;

    ...

Shouldn't it return the specialized LruRCache<TKey, TValue> instead of the abstract base class RCache<TKey, TValue> ? This way hypothetical specialized methods, specific for the LRU impl, would be available to consumers without casting it?

2) Extensibility

The second thing is about ergonomics about extensibility (lightweight, I'm not going into the factory factory factory territory here).

If you want other people (either in the OSS space or in their own private projects) to be able to come up with different implementations like the ones mentioned above (LFU, RR, etc) while still remaining in this new RCache space, the static-factory-on-static-class approach would lead to different ways to instantiate different flavours of an RCache.

A different design for the builder I would suggest can be something like this:

public class RCacheBuilder {
    // here the ctor can be marked as internal, if needed
}

public static class RCache
{
    public static RCacheBuilder Builder { get; }

    static RCache() {
      Builder = new RCacheBuilder();
    }
}

Perf wise the cost is basically zero, but it would allow others to chip in with custom implementations via extension methods in a natural way, by having a non-static (but still singleton, so statically accessible) builder instance to allow attaching ext methods to.

The current factory methods for the LRU variant can remain on the RCacheBuilder class (since they are provided with .NET itself) or they can become ext methods, like this:

public static class LruRCacheExtensions {
    public static LruRCache<TKey, TValue> CreateLru<TKey, TValue>(this RCacheBuilder builder, string name) where TKey : notnull;
}

The way to instantiate would be like this:

// from this
var cache = RCache.CreateLru<Guid, Guid>(...);

// to this
var cache = RCache.Builder.CreateLru<Guid, Guid>(...);

No big deal I think.

This way though people in the OSS space can create their own implementations, like this:

public static class BeladyRCacheExtensions {
    public static BeladyRCache<TKey, TValue> CreateBelady<TKey, TValue>(this RCacheBuilder builder, string name) where TKey : notnull;
}

So it would be possible to use them like this:

// LRU
var lruCache = RCache.Builder.CreateLru<Guid, Guid>(...);

// BELADY
var beladyCache = RCache.Builder.CreateBelady<Guid, Guid>(...);

Hope this helps.

Looking at the OSS space there are some app-level caching libs available, each with their own design and feature sets, have you tried them?

@jodydonetti Yes, I have looked at everything I have seen mentioned in this and couple other places. However, I don't think any have everything I am looking for. Also, it is generally very difficult to onboard OSS solutions where I work for security reasons. We are mostly a Microsoft shop and we generally make sure we can't use Microsoft provided feature before going through the process of getting something else we are no reliant on. If I don't find something with everything I need, then I probably should just build what I need. I am happy to be pointed to any resources that can help me arrive at app-level caching best practice, either by vendor or custom build. The main thing I am working on right now is adding cache dependency functionality and some of kind of memory management in a MemoryCache wrapper since there is no scavenging process as in System.Web.Caching. Thx

I'm glad caching (MemoryCache) is getting some attention finally! I tried to read all comments, though I could probably miss some ideas.

I understand you are aiming for providing low level caching mechanism, but would you re-consider adding asynchronous GetOrSet? In most cases I can recall, it's always async I/O why we do caching - so having method like ValueTask<TValue> GetOrSet(...) with factory returning Task<TValue> (including override with TState etc.) would be beneficial (even for further libraries extending core functionality)

Another thing I was always missing on MemoryCache was simply getting list of cached keys - though I'm not sure if that would be generally useful feature.

per @ericstj, adding @dotnet/area-extensions-caching to help design this and give feedback.

Overall I like the design as it's very simple and clean (in contraty to a very cumbersome MemoryCache). I love the fact that it has been battle tested and built based on real-life experience from high scale services.

Our implementation maintains its state on set method, thus sparing some threadpool overhead over existing libraries.

Do I understand correctly that the set method performs the maintanance work in synchronous way and does not start any async background work?

So reading values from the cache is always very cheap, but modifications can be expensive?

The latency includes optional cost of maintaining telemetry state on RCache side.

It seems that these perf numbers come from a single-threaded microbenchmark. It would be great to benchmark RCache using our TechEmpower infra for the "caching platfrom" benchmark (cc @sebastienros)

@adamsitnik Yes, we don't spawn any background threads. All cleanup activity is done when setting values in the cache, with a fixed upper bound to the work being done. So if a bunch of stuff gets evicted, it might take several set operations to clean everything up. There's an explicit API that says "please clean everything up" but I think that has marginal utility in the real world.

In practice on a busy server, the cache ends up staying clean and not holding on to stale state for very long.

We do have some multithread benchmarks and they perform very well. We should include the results here.

Can I ask what the status of this is currently? The reason I ask is that HybridCache (new in .NET 9 preview 4) has had some feedback from multiple sources that they'd like a TKey version rather than just a string-keyed version (with some kind of string factory for the L2 IDistributedCache side of things), and obviously that requires a TKey L1 to be useful, and I'm wondering whether RCache might be useful for that scenario.

Side note: HybridCache can also work without L2, and provides stampede protection, which I notice was discussed above, so : mutually beneficial :p

Thanks for pinging @mgravell. I think that this proposal is currently on @geeknoid's plate to address some feedback before we can move it forward, but there are some other proposals he had that have higher priority. Ideally we wouldn't have 2 different proposals for similar-enough features, and instead see if there are ways to having a single proposal that meet the shared goals?

Hi @joperezr , maybe sharing the 2 proposals and goals here can get some ideas from the community.

If you're talking about HybridCache as the other proposal: that's a different thing, with different aims, different use-cases, and a different API - although HybridCache could consume the other. The only thing they share is the name "cache" :)

Got it, as you can tell I'm not very familiar with either of the proposals, but in general given both are in the same "area" we should just make sure that if we are planning to take both, that they would both fit well together in our overall caching story.

Hi all, any news on how this effort is going on? Is RCache (or any other name it will be called) still being worked on for the .NET 9 timeframe? Thanks!

@mobratil was looking into this. We have a working implementation of the API, but it has a problem if TValue is more than 64 bits. In that case, it can lead to struct tearing. Fixing this is unfortunately challenging.

I really would like us to get this out the door, but no guarantees yet...

Thanks for the update.

Fixing this is unfortunately challenging.

I can imagine, good luck!

One question, without knowing the details (so I may be wrong): is the perf difference with a more traditional approach so big to be worth how delicate it is?

@mobratil was looking into this. We have a working implementation of the API, but it has a problem if TValue is more than 64 bits. In that case, it can lead to struct tearing. Fixing this is unfortunately challenging.

After some experiments I had fixed this using seqlock. Reads remain lock free and reads cannot starve writes, so throughput under load remains very good and the overhead is minimal. The disadvantage of this approach is that it can live lock. In practice I wasn't able to provoke live lock as a failure mode (I tested a worst-case scenario across .NET4.8/6.0/8.0 on different CPUs - Intel X64, AMD X64, Mac M2 and Cobalt 100). It also increases the cache entry size by 4 bytes, but in my implementation the additional counter fits inside the cache entry padding for the common cases, so this seemed like a reasonable tradeoff.

@geeknoid will sync with you to see if this fits for RCache/this proposal.

One question, without knowing the details (so I may be wrong): is the perf difference with a more traditional approach so big to be worth how delicate it is?

Not sure if this was about the cache in general or the struct tearing specifically - struct tearing is a correctness problem rather than a perf issue, cache readers can see the result of partial writes leading to corrupted values (e.g. update a Guid, and a concurrent reader may see half of the previous Guid and half of the new one). When I tried to reproduce this, its trivially easy on a 32 bit build target. For 64 bit it seems like the struct must be larger than a cache line (64 bytes for X64). I suspect most people run on X64 with structs smaller than 64 bytes, so you would be unlikely to bump into this as a practical problem.

dotnet / extensions