cristipufu / aspnetcore-redis-rate-limiting

Set up a Redis backplane for ASP.NET Core multi-node deployments, using the built-in Rate Limiting support that's part of .NET 7 and .NET 8.
MIT License
226 stars 23 forks source link

Rate limiter causes thread pool exhaustion #66

Closed hacst closed 1 year ago

hacst commented 1 year ago

When trying to load test the sliding window limiter I noticed that it quickly lead to redis timeouts especially in environments with limited cpu count and/or and a bit of latency to redis. This seems to be caused by exhausting the thread pool. Looking into the issue I found the following: The TryAcquireAsync function in RateLimitingMiddleware.cs first does a synchronous AttemptAcquire call before falling back to before falling back to AcquireAsync on the RateLimiter. This means unless something goes wrong, only synchronous StackExchange Redis calls are performed by this package.

Looking at the documentation on the RateLimiter class it says:

AcquireAsync(Int32, CancellationToken) Wait until the requested permits are available or permits can no longer be acquired.
AttemptAcquire(Int32) Fast synchronous attempt to acquire permits.

So this isn't just the usual case of having an async and a blocking implementation but meant to be distinct functions that can both be used.

I have to admit that I am not sure I fully get the intent of the interface with regards to waiting until a permit is available. But I think the way to get proper scalability would probably be to never return a lease from the synchronous call and only reach out to redis in the asynchronous one.

Would this be an appropriate change? Bit surprised I am the first one to stumble over this so maybe I am on the wrong track completely.

cristipufu commented 1 year ago

Yes, unfortunately the Microsoft API was not designed having in mind a possible I/O implementation of the rate limiters.

I'm not sure what would be the best way of moving forward with this issue.

If we never return a lease from the synchronous call, the rate limiter by itself wouldn't have a complete implementation.

Maybe implement another rate limiting middleware, that skips the call to the synchronous method?

hacst commented 1 year ago

Is it actually incomplete though? My interpretation would've been, that this limiter simply does not give a lease on the fast path. Is it required to?

Unfortunately I did not find any discussion or deeper documentation on that specific part of the api so you probably know better than me.

cristipufu commented 1 year ago

Same issue here: https://github.com/cristipufu/aspnetcore-redis-rate-limiting/blob/master/src/RedisRateLimiting/Concurrency/RedisConcurrencyRateLimiter.cs#L309

Discussion here: https://github.com/dotnet/runtime/issues/77669

Maybe it would help leaving a comment on the dotnet runtime issue as well

hacst commented 1 year ago

I see. Maybe this should even have a separate issue. Even if it only results in a clear documentation of the intent of the API.

kamilslusarczyk commented 1 year ago

I spent some time last week on that one and can confirm - both sync acquire & getting statistics are causing extensive thread pool exhaustion.

Using ThreadPool.SetMinThreads(x, y) as a ugly workaround works but is not a correct solution by any means.

hacst commented 1 year ago

My current workaround for the issue looks like this:

  public class AsyncOnlyRedisSlidingWindowRateLimiter<TKey> : RedisSlidingWindowRateLimiter<TKey>
  {
    public AsyncOnlyRedisSlidingWindowRateLimiter(TKey partitionKey, RedisSlidingWindowRateLimiterOptions options) : base(partitionKey, options) {}

    private sealed class NotAcquiredLease : RateLimitLease
    {
      public override bool IsAcquired => false;
      public override IEnumerable<string> MetadataNames => throw new NotImplementedException();
      public override bool TryGetMetadata(string metadataName, out object? metadata)
      {
        throw new NotImplementedException();
      }
    }

    protected override RateLimitLease AttemptAcquireCore(int permitCount)
    {
      return new NotAcquiredLease();
    }
  }

Obviously not ideal either but trying to using more threads does not scale for me. It seems to work for what I need right now though I am still hoping for an official solution.

cristipufu commented 1 year ago

We could overwrite these classes in the RedisRateLimiting.AspNetCore package 🤔

kamilslusarczyk commented 1 year ago

For reference: related issue created in MS's repo: https://github.com/dotnet/runtime/issues/88592

cristipufu commented 1 year ago

@hacst @kamilslusarczyk please take a look: https://github.com/cristipufu/aspnetcore-redis-rate-limiting/pull/94

hacst commented 1 year ago

@hacst @kamilslusarczyk please take a look: https://github.com/cristipufu/aspnetcore-redis-rate-limiting/pull/94

LGTM code wise. Should fix the exhaustion with the asp net core rate limiter Middleware. I would add a comment explaining why it has to be structured like this and why it works as that isn't so obvious.

As this doesn't touch the statistics code paths these would still suffer from thread pool exhaustion if an application used those extensively. But given the api I do not see an obvious way of fixing/working around that.

kamilslusarczyk commented 1 year ago

Sorry for delay on that one. I agree with @hacst - this indeed look a bit as a workaround so maybe an inline comment in a codebase would be nice. That said, I know that this PR is already merged and I'm late with my claims. : )