Eager cache refresh - Githubissues

matdurand commented 4 months ago

Hi there,

Amazing library!

A possible use case I have is that I will be using your lib with Redis and multiple instances of my app running. I assume I will get limited benefits from the in-flight requests tracking (stampede and dedup) across all the running instances of my app. One thing I did in the past to alleviate the issue is to do eager cache refresh of some keys about to expire, but in a staggered manner so not all instances are doing the refresh at the same time.

Example. I cache a key for 60mins. Then I specify an eager refresh delay (it was per key in my case, but it could be for the whole cache if it make sense). Let's say in this example, it's 10mins. That means that if I request the key between min 50 and 60, the cache will serve the cached value, but there is a chance to trigger a refresh in the background. The percentage of chances to trigger a request depends on the length of time remaining before expiration (0-10 mins in this example) and the frequency of calls for this key. A key not frequently requested would get a high chance of being async-refreshed sooner than later since we won't have a lot of chances to refresh it.

What do you think about this feature?

creativecreature commented 4 months ago

Hi, and thank you for the kind words!

We are currently testing the library with Redis ourselves using a pre-release from this branch: https://github.com/creativecreature/sturdyc/pull/13. The PR adds a WithDistributedStaleStorage option, which seems to match what you need. You will be able to set a 1-hour TTL in Redis and then configure the cache to refresh the records after 10 minutes have passed.

If you only want one container to refresh the key during this time, you could use the WithBackgroundRefreshes option. To receive only one refresh per Redis key on average, you would need to ensure that the difference between the minimum and maximum refresh times equals the time it takes to fetch the data from the underlying data source multiplied by the number of containers you're running

creativecreature commented 4 months ago

If you want, you could use go get github.com/creativecreature/sturdyc@c144c6e853bd48b5ce5cb56510958d5ef6541225 to download the latest commit of that branch, and then test it with a Redis client to see if you're able to achieve what you want:

import (
    "context"
    "math/rand/v2"
    "time"

    "github.com/redis/go-redis/v9"
)

const (
    minTTL = 3 * time.Hour
    maxTTL = 6 * time.Hour
)

type RedisClient struct {
    redis *redis.Client
}

func NewRedis(host string) *RedisClient {
    redis := redis.NewClient(&redis.Options{ Addr: host })
    return &RedisClient{ redis: redis }
}

func randomTTL() time.Duration {
    return minTTL + time.Duration(rand.Int64N(int64(maxTTL-minTTL)))
}

func (c *RedisClient) Get(ctx context.Context, key string) ([]byte, bool) {
    res := c.redis.Get(ctx, key)
    if res == nil {
        return []byte{}, false
    }
    val, err := res.Bytes()
    if err != nil {
        return []byte{}, false
    }
    return val, true
}

func (c *RedisClient) Set(ctx context.Context, key string, value []byte) {
    c.redis.Set(ctx, key, value, randomTTL())
}

func (c *RedisClient) GetBatch(ctx context.Context, keys []string) map[string][]byte {
    cmd := c.redis.MGet(ctx, keys...)
    if cmd == nil {
        return map[string][]byte{}
    }

    records, err := cmd.Result()
    if err != nil {
        return map[string][]byte{}
    }
    hits := make(map[string][]byte, len(keys))
    for i, key := range keys {
        if records[i] != nil {
            hits[key] = []byte(records[i].(string))
        }
    }
    return hits
}

func (c *RedisClient) SetBatch(ctx context.Context, records map[string][]byte) {
    for key, value := range records {
        c.redis.Set(ctx, key, value, randomTTL())
    }
}

func (c *RedisClient) Delete(ctx context.Context, key string) {
    c.redis.Del(ctx, key)
}

func (c *RedisClient) DeleteBatch(ctx context.Context, keys []string) {
    c.redis.Del(ctx, keys...)
}

This Redis client should satisfy the DistributedStaleStorage interface, thus making it possible to pass it to the WithDistributedStaleStorage option.

I'm going to add more documentation and a more complete example once we've been able to verify that everything works as intended!

matdurand commented 4 months ago

I'm not sure I understand the semantic of the background refresh terms. Given an example where I have 3 instances, a key TTL of 60 mins, and assuming the call to get the data takes 100ms, what would be the values for the WithBackgroundRefreshes min/max?

creativecreature commented 4 months ago

The WithBackgroundRefreshes option determines how often the in-memory cache should refresh the keys. Using something like this:

WithBackgroundRefreshes(time.Second, time.Second * 30, time.Second)

would make your containers refresh any given key when it is requested again after a randomized threshold between 1 and 30 seconds. Randomizing the thresholds spreads out the traffic evenly over time. If it were a fixed duration, it would lead to a comb-like request graph.

So it might result in the following for any given key:

Container 1: 1 second
Container 2: 14 seconds
Container 3: 27 seconds

And, pairing this with the WithDistributedStaleStorage option:

WithDistributedStaleStorage(redisClient, time.Minute)

makes the in-memory cache check the distributed storage first. If the value there is older than 1 minute, it will access the underlying data source, then write the value to both the in-memory cache and Redis. When it writes to the in-memory cache, it will receive another random refresh threshold between 1 and 30 seconds into the future.

Therefore it's highly unlikely that two containers would refresh the same key in Redis because they would have to be randomized within 100 milliseconds of one another.

creativecreature commented 4 months ago

But the WithBackgroundRefreshes is essentially an eager update. It allows you to set a high TTL, while the cache will attempt to refresh a key earlier if it’s requested again after that period of time has passed. We’re running the cache like that in production to be able to serve stale if an upstream goes down.

matdurand commented 4 months ago

Assuming your random distribution from above (assuming all 3 containers received a request for that same key again)

Container 1: 1 second
Container 2: 14 seconds
Container 3: 27 seconds if Container 1 refreshes the key after 1 sec, and then 2 and 3 come after with their own refreshes, it seems like this might lead to over-fetching. Is there a way to avoid the "eager" part and have only 1 container on average refreshing a single key? (or maybe I misunderstood the effect of WithDistributedStaleStorage?)

matdurand commented 4 months ago

Another question I have would be...

makes the in-memory cache check the distributed storage first

So the data is in-memory first, and the in Redis? is there a way to limit the in-memory scope and start dropping in-memory keys if it gets too big?

About my last reply, he is a more complete example. Did I get that right?

We have 2 containers, with background refresh configured to min:60sec/max:120sec, and using WithDistributedStaleStorage(redisClient, time.Minute)

At 10 secs, we store key1 with TTL 60 mins from Container 1, and then the value is also written in Redis.
At 12 secs, we get key1 from Container1. It retrieves the value from memory and nothing else happen
At 61 secs, we get key1 from Container1. It retrieves the value from memory and trigger a background refresh with a randomized value of 67 secs.
At 62 secs, we get key1 from Container2. It retrieves the value from Redis, since it's not in memory yet, and nothing else happen.
At 64 secs, we get key1 from Container2. It retrieves the value from memory, and nothing else happen, since the background refreshes min is 60 sec (from last retrieval?)
At 67 secs, Container 1 backgroud refresh kicks in. The new value is stored in memory and then in Redis.

I have no idea where the WithDistributedStaleStorage config factors in this example.

creativecreature commented 4 months ago

So the duration you pass to WithDistributedStaleStorage is going to control how frequently the underlying data source is visited. If you're setting it to a minute, you'll never have two requests within that period unless your containers randomizes a refresh time within milliseconds of one another.

The min and max durations you pass to WithBackgroundRefreshes controls how frequently the in-memory cache tries to refresh the data. With the WithDistributedStaleStorage it's always going to try that data source first, and then only visit the "real" source if the data is older than the duration you specified. If that request would fail the cache will fallback to the value it got from redis even if its older than a minute.

is there a way to limit the in-memory scope and start dropping in-memory keys if it gets too big?

The values you specify when you create the client controls this:

capacity := 1000
numberOfShards := 10
ttl := time.Hour
evictionPercentage := 30
cacheClient := sturdyc.New[string](capacity, numberOfShards, ttl, evictionPercentage)

The cache will evict evictionPercentage once the capacity is reached and you perform another write. So for this example when you're writing record number 1001, it would evict 300 items before it performs that write.

Having a cache that evicts on every write is going to slowdown your system so I would advice that you evict at least 10%. The evictions happen at O(N) time complexity using quick select and is based on recency

creativecreature commented 4 months ago

I'm going to close this issue because I believe the functionality you want is already possible to achieve. However, I think this discussion was really valuable because it made me reconsider the API.

I decided to rename the following APIs:

WithBackgroundRefreshes => WithEarlyRefreshes WithRefreshBuffering => WithRefreshCoalescing WithDistributedStaleStorage => WithDistributedStorageEarlyRefreshes

I've merged these changes to main, but I haven't created a new release yet. I will do so as soon as I find the time to write the release notes. However, you can install the package using the latest commit on that branch and have a look at the README for some examples.

I think these will be particularly interesting for your use case:

creativecreature / sturdyc

Eager cache refresh #14