Cache re-validation strategy to avoid cache stampede

Benoss commented 2 days ago

I am trying to find a library that can tackle caching and cache re-validation.

For example when the cache is not available, I would like the first request execute the function but concurrent ones to wait until the first has finished instead of executing the function as well.

This is to avoid https://en.wikipedia.org/wiki/Cache_stampede a good resource on the subject: https://grantjenks.com/docs/diskcache/case-study-landing-page-caching.html

Here is a simple example where I would expect only one concurrent API call at a time

import cachebox
from cachebox import TTLCache
import httpx
from concurrent import futures
import time
import logging

logging.basicConfig(level=logging.DEBUG)

mycache = TTLCache(0, ttl=3)

@cachebox.cached(mycache)
def sync_call() -> dict:
    logging.info("Httpx Call")
    res = httpx.get("https://fakeresponder.com/?sleep=2000")
    data = res.json()
    return data

if __name__ == "__main__":
    with futures.ThreadPoolExecutor(max_workers=5) as executor:
        for _ in range(1, 10):
            future_list = [executor.submit(sync_call) for _ in range(10)]
            for future in futures.as_completed(future_list):
                logging.info(f"got result: {future.result()}")

            time.sleep(1)

ecarrara commented 2 days ago

Hey @Benoss, I built a library called yapcache that does what you're looking for - It makes sure only one request executes the underlying function when the cache is empty, to prevent stampedes. And it uses the cachebox library under the hood.

# ...
from yapcache import memoize
from yapcache.caches import InMemoryCache
from yapcache.distlock import RedisDistLock

cache = InMemoryCache(maxsize=2_000)   # uses cachebox TTLCache

@memoize(
    cache,
    ttl=60,
    cache_key=lambda n: f"fn1-{n}",
    lock=lambda key: RedisDistLock(redis_client, key),
)
async def fn1(n: int):
    logging.info("Httpx Call")
    res = httpx.get("https://fakeresponder.com/?sleep=2000")
    data = res.json()
    return data

awolverp commented 1 day ago

thank you for your issue ❤️ It takes a long time; I'll do it whenever I get the chance to

Benoss commented 1 day ago

Hey @Benoss, I built a library called yapcache that does what you're looking for - It makes sure only one request executes the underlying function when the cache is empty, to prevent stampedes. And it uses the cachebox library under the hood.

Thanks, I will give it a go. Love the name of the lib BTW

awolverp / cachebox

Cache re-validation strategy to avoid cache stampede #15