Closed hacst closed 1 year ago
Yes, unfortunately the Microsoft API was not designed having in mind a possible I/O implementation of the rate limiters.
I'm not sure what would be the best way of moving forward with this issue.
If we never return a lease from the synchronous call, the rate limiter by itself wouldn't have a complete implementation.
Maybe implement another rate limiting middleware, that skips the call to the synchronous method?
Is it actually incomplete though? My interpretation would've been, that this limiter simply does not give a lease on the fast path. Is it required to?
Unfortunately I did not find any discussion or deeper documentation on that specific part of the api so you probably know better than me.
Same issue here: https://github.com/cristipufu/aspnetcore-redis-rate-limiting/blob/master/src/RedisRateLimiting/Concurrency/RedisConcurrencyRateLimiter.cs#L309
Discussion here: https://github.com/dotnet/runtime/issues/77669
Maybe it would help leaving a comment on the dotnet runtime issue as well
I see. Maybe this should even have a separate issue. Even if it only results in a clear documentation of the intent of the API.
I spent some time last week on that one and can confirm - both sync acquire & getting statistics are causing extensive thread pool exhaustion.
Using ThreadPool.SetMinThreads(x, y)
as a ugly workaround works but is not a correct solution by any means.
My current workaround for the issue looks like this:
public class AsyncOnlyRedisSlidingWindowRateLimiter<TKey> : RedisSlidingWindowRateLimiter<TKey>
{
public AsyncOnlyRedisSlidingWindowRateLimiter(TKey partitionKey, RedisSlidingWindowRateLimiterOptions options) : base(partitionKey, options) {}
private sealed class NotAcquiredLease : RateLimitLease
{
public override bool IsAcquired => false;
public override IEnumerable<string> MetadataNames => throw new NotImplementedException();
public override bool TryGetMetadata(string metadataName, out object? metadata)
{
throw new NotImplementedException();
}
}
protected override RateLimitLease AttemptAcquireCore(int permitCount)
{
return new NotAcquiredLease();
}
}
Obviously not ideal either but trying to using more threads does not scale for me. It seems to work for what I need right now though I am still hoping for an official solution.
We could overwrite these classes in the RedisRateLimiting.AspNetCore package 🤔
For reference: related issue created in MS's repo: https://github.com/dotnet/runtime/issues/88592
@hacst @kamilslusarczyk please take a look: https://github.com/cristipufu/aspnetcore-redis-rate-limiting/pull/94
@hacst @kamilslusarczyk please take a look: https://github.com/cristipufu/aspnetcore-redis-rate-limiting/pull/94
LGTM code wise. Should fix the exhaustion with the asp net core rate limiter Middleware. I would add a comment explaining why it has to be structured like this and why it works as that isn't so obvious.
As this doesn't touch the statistics code paths these would still suffer from thread pool exhaustion if an application used those extensively. But given the api I do not see an obvious way of fixing/working around that.
Sorry for delay on that one. I agree with @hacst - this indeed look a bit as a workaround so maybe an inline comment in a codebase would be nice. That said, I know that this PR is already merged and I'm late with my claims. : )
When trying to load test the sliding window limiter I noticed that it quickly lead to redis timeouts especially in environments with limited cpu count and/or and a bit of latency to redis. This seems to be caused by exhausting the thread pool. Looking into the issue I found the following: The
TryAcquireAsync
function in RateLimitingMiddleware.cs first does a synchronousAttemptAcquire
call before falling back to before falling back toAcquireAsync
on the RateLimiter. This means unless something goes wrong, only synchronous StackExchange Redis calls are performed by this package.Looking at the documentation on the
RateLimiter
class it says:So this isn't just the usual case of having an async and a blocking implementation but meant to be distinct functions that can both be used.
I have to admit that I am not sure I fully get the intent of the interface with regards to waiting until a permit is available. But I think the way to get proper scalability would probably be to never return a lease from the synchronous call and only reach out to redis in the asynchronous one.
Would this be an appropriate change? Bit surprised I am the first one to stumble over this so maybe I am on the wrong track completely.