App-vNext / Polly

Polly is a .NET resilience and transient-fault-handling library that allows developers to express policies such as Retry, Circuit Breaker, Timeout, Bulkhead Isolation, and Fallback in a fluent and thread-safe manner. From version 6.0.1, Polly targets .NET Standard 1.1 and 2.0+.
https://www.thepollyproject.org
BSD 3-Clause "New" or "Revised" License
13.31k stars 1.22k forks source link

Feature request: rate-limiting (requests/time) #260

Closed georgiosd closed 2 years ago

georgiosd commented 7 years ago

Hi :)

I'm enjoying Polly very much, good work!

Would love something like this: http://www.jackleitch.net/2010/10/better-rate-limiting-with-dot-net/ that works better than that :)

I've used that RateGate to connect to APIs that don't ban you if you send more than x requests in y seconds and it doesn't work too well, it leaks here and there.

joelhulen commented 7 years ago

@georgiosd We're glad you're enjoying Polly :)

Have you looked into using Polly to handle TResult?

There are typical HTTP results that are returned once you hit a rate limit or a server is busy (such as 429). Some API providers even return header results that tell you how long to wait before retrying. In fact, @dreisenberger wrote a blog post last month on the subject. There's also much discussion around the topic here.

georgiosd commented 7 years ago

Actually for the use cases I'm thinking about (cryptocurrency exchanges API), hitting the limit would be bad - the goal is to poll data as quickly as possible and if you get rate limitted there is a penalty that could make me miss an important event.

Makes sense? :)

joelhulen commented 7 years ago

Ok, I understand now. I thought you were dealing with HTTP requests where you just have to back off for a bit. In your case, you can't afford to hit the limit. So do you know the rate limits up front? You can, with absolute certainty, say that "I can send x requests within a given n minute window"? Also, is your application multi-threaded with multiple instances potentially calling this API at once, requiring you to share your API hit count across instances?

georgiosd commented 7 years ago

Correct. The limits are published (though often inaccurately).

It is multi-threaded because there are several kinds of updates that need to happen at once (trade updates, account updates). I usually have an event loop running in a Task fetching each one. So the rate gate would effectively throttle the event loops.

reisenberger commented 7 years ago

Hi @georgiosd , thanks for joining the Polly conversation!

I've looked at Jack Leitch's article in the past, as we've had discussion around a rate-limiting Policy in our slack channel. You mentioned "it leaks here and there": if you have any specifics on that (what you thought was causing the leak; or just the circumstances), it would be good to hear more, so that we can consider that in any Polly implementation.

One evident issue: any rate-limiter whose approach is to hold back 'hot' tasks - tasks which are already executing, in memory - is intrinsically vulnerable to memory bulges if fresh requests consistently outstrip the permitted rate, simply because those pending requests are all in memory. At least, in the absence of any co-operative demand control (back-pressure) or load-shedding. I've noted this in more detail in the Polly Roadmap. Is this the kind of thing you were thinking of with the 'leaks', or did you detect leaks even in relatively steady state? (ie without large numbers of pending requests backing up).

Thanks!

georgiosd commented 7 years ago

Hey @reisenberger, thanks for checking in.

I must say that this was a few months ago so my recollection is somewhat hazy - I remember that I was hitting the rate limits all the time and decided to print out the rate-limited timestamps of the requests and some of them were out of whack.

i.e. let's say the fastest I can call the API is 2s, it would be like:

2s
1.99s
2.01s
0.5s
2s

You get the idea.

I tried to figure out what could be causing it but I couldn't understand what the code does, well enough.

I also don't think your second point applies to my particular use case, it's much simpler than that. Basically there are 1-5 tasks/event loops that are running concurrently, making HTTP calls to the same API. I must be sure that a) none of them is allowed to execute faster than the permitted rate and b) there is some fairness in entering the critical section. Obvious, I guess, as you wouldn't want any of the event loops to "starve" (wait for ever or for a long time vs other event loops).

Makes sense? Happy to provide any more details.

reisenberger commented 7 years ago

Hi @georgiosd . Re:

I also don't think your second point applies to my particular use case Basically there are 1-5 tasks/event loops that are running concurrently

With you there :+1: (suspected that might be the setup from your initial description, but good to have it confirmed in more detail).

Re:

tried to figure out what could be causing it

The only observation I can make is that, if those up-to-five separate event loops are hitting the same API and need (combined) to not exceed the rate limit, then they'd need (from my code reading) to share the same RateGate instance. (The same would also apply for the way I envisage we could implement this as a Polly RateLimit policy.)

Testing the multi-threaded case robustly is certainly something we should do if we implement a Polly RateLimit policy.

georgiosd commented 7 years ago

No problem! I forgot the mention the "leak" is that 0.5s delay amongst the pool of 2s ones.

And yes, the RateGate was shared between event loops.

reisenberger commented 7 years ago

:+1: @georgiosd Thanks for the clarification and feedback!

georgiosd commented 7 years ago

Hey @reisenberger - I resurrected the project that needs this so I was just wondering whether this is on the roadmap somewhere? :)

reisenberger commented 7 years ago

Hi @georgiosd . This is on the roadmap but there isn't any allocated resource or timescale. For the core maintainers, some other large-ish items are ahead at mo (CachePolicy; unify sync/async policies; perf enhancements).

We'd love to see it implemented, however, if you (or anyone else) is interested in working on a PR. Shout if so, and we can provide guidance on how to structure an implementation as a Polly policy!

georgiosd commented 7 years ago

It's not my core strength (which is why I wasn't able to find what was wrong with the RateGate and fix it, in the first place) but if you can help out with the actual logic, I'd love to contribute.

reisenberger commented 7 years ago

Understood @georgiosd . When I/somebody can squeeze an implementation out, it would awesome to have you put it through its paces! (or any other contributions!). As prev. noted, other development priorities are (unfortunately) ahead of this feature at mo.

georgiosd commented 7 years ago

If you have give me some steps/implementation guidance, I can give it a try!

reisenberger commented 7 years ago

Hi @georgiosd . The best way to see the architecture of how a new Policy is implemented is look at the shape of the files making up the NoOpPolicy (and its tests). NoOpPolicy is just an (intentionally) empty policy which does nothing, so that shows you the bare bones structure you would start with ... for adding a new policy.

https://github.com/App-vNext/Polly/tree/master/src/Polly.Shared/NoOp

https://github.com/App-vNext/Polly/pull/214/files

georgiosd commented 7 years ago

Im not so much worried about how you make a policy but rather about how you make a reliable rate limitter of this kind :) On Wed, 2 Aug 2017 at 01:04, reisenberger notifications@github.com wrote:

Hi @georgiosd https://github.com/georgiosd . The best way to see the architecture of how a new Policy is implemented is look at the shape of the files making up the NoOpPolicy (and its tests). NoOpPolicy is just an (intentionally) empty policy which does nothing, so that shows you the bare bones structure you would start with ... for adding a new policy.

https://github.com/App-vNext/Polly/tree/master/src/Polly.Shared/NoOp

https://github.com/App-vNext/Polly/pull/214/files

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/App-vNext/Polly/issues/260#issuecomment-319510421, or mute the thread https://github.com/notifications/unsubscribe-auth/ABmDdkU5BqMugQHJYr-GdZ1G6TTKwH_9ks5sT6DQgaJpZM4N8f9P .

reisenberger commented 6 years ago

Cross-ref #330

Matthewsre commented 6 years ago

Just came over to request this feature and see it is already on the roadmap. Would love to see this implemented.

phatcher commented 6 years ago

Just chipping in with a couple of thoughts...

Basically this looks to me like a classic producer/consumer pattern i.e. we have requests arriving at some rate and they can either be executed immediately or they need to be queued up since we are in danger of breaching the rate limit.

One open question is do we regularise the flow i.e. constant time between each request or do we let them flow as fast as possible until we get close to the limit.

This seems to be the token bucket/leaky bucket concepts.

There's a C# implementation of this at TokenBucket which I think could be adapted to this purpose - might be more than one policy depending on what behaviour you want e.g. discard silent, discard but throw etc, etc

reisenberger commented 5 years ago

x-ref #528

rahulrai-in commented 5 years ago

Hi, I would like to take up this activity to contribute, however, I am a little confused with the discussion until this point. Is this issue still open and what is the conclusion with respect to implementing this feature?

reisenberger commented 5 years ago

Hi @rahulrai-in . Thank you for your offer to take this up! This is still up-for-grabs.

I took a couple of hours to sketch some code ideas that were kicking around my head. I have pushed these to a branch: https://github.com/reisenberger/Polly/tree/reisenberger-rateLimiterSketch

Please feel free to build on this. Is it useful? Is it missing any important angle?

I'll try to post some further notes shortly about the code in that ^ branch.

reisenberger commented 5 years ago

Some notes about https://github.com/reisenberger/Polly/tree/reisenberger-rateLimiterSketch, as promised;


This feature is not a priority for my own projects, but I am happy to collaborate/guide/help as much as I can, for those who need it in their own projects, to implement. Thank you @rahulrai-in for your interest! Do you think this is a useful starting point to take forward? Or do you have other ideas / thoughts / questions, how it could be done? And: same questions to anyone else interested! Thanks.

reisenberger commented 5 years ago

A final piece of explanation, about the Func<TimeSpan, Context, TResult> retryAfterFactory in https://github.com/reisenberger/Polly/tree/reisenberger-rateLimiterSketch:

The proposed syntax allows the user to (optionally) configure the policy with a Func<TimeSpan, Context, TResult> retryAfterFactory. This allows the user to specify - for the case when a call is rate-limited - how they would like the retry-after expressed.

To give an example, for a policy guarding calls returning HttpResponseMessage, a user could use the factory an obvious way like:

Func<TimeSpan, Context, HttpResponseMessage> retryAfterFactory = (span, context) =>
{
    var response = new HttpResponseMessage
    {
        StatusCode = (HttpStatusCode) 429,
        ReasonPhrase = "Too Many Requests",
        Content = new StringContent(string.Format(CultureInfo.InvariantCulture, "Rate limit reached. Retry in {0} seconds.", span.TotalSeconds))
    };

    response.Headers.Add("Retry-After", span.TotalSeconds.ToString(CultureInfo.InvariantCulture));

    return response;
};

The idea of this Func<TimeSpan, Context, TResult> retryAfterFactory is that it could be useful for both those imposing a rate-limit (as a service-provider wanting to rate-limit callers); and for those wanting to abide by a rate-limit they know a downstream system will impose on them (and perhaps they prefer/need to not breach the limit so they don't get blacklisted for a period). If you are wanting to abide by a rate-limit which you expect the called system to impose, it could be very convenient to build a rate-limit policy that limits you by providing responses in exactly the same style as the downstream system would. That way, you only have to build one WaitAndRetry(...) policy to respond to those "you've been rate-limited" signals.

aprooks commented 5 years ago

Hi @reisenberger, I've also started prototyping but you've come much further.

My only concern is with the retry-after factory. Won't it be misleading if one instance of service sends retry-after error, while the other one is free? I mean I would only add it for distributed token-bucket so that retry-after could be used as a contract of service. And keep local token bucket as a defense mechanism, for example, if a load balancer is misconfigured and sends too many requests to an instance.

Only thing I would add a number of tokens returned as a configuration value so that I could allow burst capacity and fill up slowly. For example, the bucket must have no more than 100 tokens, and 20 tokens each second.

There is a lock-based and a lock-free implementation of IRateLimiter

Lock-based implementation is much more straightforward to read. I agree only one should be a part of the main project, but I'd choose which one to keep only after testing.

How would like this work to be continued, as PR to main repo or a contrib project? Are missing tests is the only core missing feature?

rahulrai-in commented 5 years ago

Thanks, @reisenberger that's a good deal of information for me to start. I will review the changes and ask further questions. @aprooks just in the interest of avoiding duplication of efforts, are you still working on the ticket or can I pick this up?

aprooks commented 5 years ago

@rahulrai-in feel free to pick it. I don't have much time to work on it so I'd limit myself reviews, tests, or other small tasks which won't block anyone

rahulrai-in commented 5 years ago

@rahulrai-in feel free to pick it. I don't have much time to work on it so I'd limit myself reviews, tests, or other small tasks which won't block anyone

Thanks. I will keep updating progress in this thread.

reisenberger commented 5 years ago

Hi both @aprooks and @rahulrai-in . Thanks for the collaboration and great comments!

For example, the bucket must have no more than 100 tokens, and 20 tokens each second.

I think that is covered (but let me know if you agree, or if I have misunderstood and missed something). So, with the sketch prototype it is possible to configure:

var rateLimiter = Policy.RateLimitAsync<TResult>(
    numberOfExecutions: 20,
    perTimeSpan: TimeSpan.FromSeconds(1),
    maxBurst: 100);

This means that you are limited to an average rate of 20 executions per second. In fact, that is translated into a single token being added to the bucket every 1/20th of a second. So if you are exceeding that (such that you empty your bucket), you will be throttled back to one execution per 1/20th of a second. But the bucket capacity is 100 tokens. So that is the maximum burst that is permitted. If you had made no executions for 5 seconds (or more), you would have the max, 100 tokens, in the bucket, so then you could do 100 executions in a single instant/burst.

If the user wants pure traffic-smoothing, they set maxBurst to 1: the traffic will be smoothed to never more than 1 per 1/20th of a second. Very smooth. If the user wants 20 in any second (but up to 20 in close succession within a second is ok), they set maxBurst to 20. If they want a bigger burst possible (but forcing back to average after the burst), they set eg maxBurst: 100 like discussed above.

Is that @aprooks the kind of feature you were thinking of, or maybe did I misunderstand?

aprooks commented 5 years ago

thanks for the explanation, I missed this part https://github.com/reisenberger/Polly/commit/6aa1e2a4c8667129424689374fbab98c996d6c13#diff-090e9c5495c1637766d1c7659cce542dR125

so on policy configuration part, this looks exactly how I thought it should be.

And unless someone would like to have more than 1 operation per tick, it should work :)

reisenberger commented 5 years ago

Hi @aprooks This is a great question:

the retry-after factory: Won't it be misleading if one instance of service sends retry-after error, while the other one is free?

This raises a great discussion about how to scope RateLimitPolicy instances (we can add to doco later).

A single RateLimitPolicy instance would govern calls placed through it to the specified rate. But it is up to you (as the user) then how you scope/deploy your instances of RateLimitPolicy, to achieve different effects. So, like scoping CircuitBreaker and Bulkhead policies:


(a) could be very appropriate when rate-abiding as the caller. For instance, if google API limits you to N calls per minute across all the google APIs, and you call three different google APIs at three places in your code, you can share the same RateLimitPolicy instance to ensure that your calls to google API are (considered together) limited to N per minute.

(a) could also be appropriate when rate-enforcing as a service provider. You maintain a RateLimitPolicy instance per customer (it is fairly lightweight; 3 object references and four int64 fields). But you might handle calls from that customer simultaneously on multiple threads, so each usage pulls from somewhere the same RateLimitPolicy instance representing that customer's usage.

With this case (multiple instances of a downstream system) :

Won't it be misleading if one instance of service sends retry-after error, while the other one is free

I guess it depends if you know (as caller) which downstream instance you are calling. If you know which downstream instance you are calling, you adopt strategy (b) a separate instance of RateLimitPolicy per instance-of-downstream-system, and each RateLimitPolicy instance maintains a rate-limiting record for that downstream node. It is like the discussion of separate circuit-breaker instances for separate node here.

If you don't know which downstream instance will be selected - if it is opaque to the calling code, behind a load-balancer which exposes a single endpoint - there is not much way an upstream rate-limit policy can accurately help with that situation, as you say. It is up to the load-balancer to distribute load effectively.

Again: did I understand clearly, or miss anything? And: thanks for the great thoughts/questions

aprooks commented 5 years ago

That was more than I've expected. Thank you!

Misiu commented 5 years ago

@aprooks any progress? SOmething we can test? I currently use custom HttpClientFactory that creates HttpClient with my own DelegateHandler (it is using SemaphoreSlim to limit requests per minute), but I would like to replace my custom solution with Polly.

aprooks commented 5 years ago

Hi, I thought there was someone else picking it. I will try to push what I have had in progress this weekend

Get Outlook for Androidhttps://aka.ms/ghei36


From: Misiu notifications@github.com Sent: Tuesday, July 9, 2019 8:52:51 AM To: App-vNext/Polly Cc: Alexander Prooks; Mention Subject: Re: [App-vNext/Polly] Feature request: rate-limiting (requests/time) (#260)

@aprookshttps://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Faprooks&data=02%7C01%7C%7C2d2bff48651e4ca5603f08d7043a100c%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636982519728862414&sdata=JlwCvn2GdcRWsa5vTh8w0j1pV5m9wZzjQiWwW%2BQblRc%3D&reserved=0 any progress? SOmething we can test? I currently use custom HttpClientFactory that creates HttpClient with my own DelegateHandler (it is using SemaphoreSlim to limit requests per minute), but I would like to replace my custom solution with Polly.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FApp-vNext%2FPolly%2Fissues%2F260%3Femail_source%3Dnotifications%26email_token%3DAAJU3OTT7WF6EC2D43GS3STP6QYUHA5CNFSM4DPR75H2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZPJCVA%23issuecomment-509514068&data=02%7C01%7C%7C2d2bff48651e4ca5603f08d7043a100c%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636982519728872413&sdata=uJJDi9ROQkxKL6CADe5%2BvZK6r8SEUGWnRB95IxSvhHU%3D&reserved=0, or mute the threadhttps://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAAJU3OVIF4VNE3FY5CDII7DP6QYUHANCNFSM4DPR75HQ&data=02%7C01%7C%7C2d2bff48651e4ca5603f08d7043a100c%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636982519728882430&sdata=JbyfGdb8XVAzRQhVGopPDWlMNW2QNJIhJ3%2BgU3hZCZQ%3D&reserved=0.

Misiu commented 5 years ago

@aprooks sorry I missed Your message. @rahulrai-in maybe You had any progress on this?

rahulrai-in commented 5 years ago

Apologies. I was quite sloppy with this issue. I did take a look at the initial code built by @reisenberger and did start with it but never progressed much from there. I will get back to it soon or maybe collaborate with @aprooks on it.

reisenberger commented 5 years ago

Coincidentally, I had reason to pursue this, end of last week. I've just pushed some updates from last week to this branch. (Also v happy to see what anyone else has - all ideas are good!)

I am still reflecting on what next steps/further steps might be (one for sure: want to change the underlying timer mechanism). Any thought on any aspect welcome.

aprooks commented 5 years ago

Thanks for the update. I was more interested in usage scenarios and testing. So please let me know when you finish with internals, and I will review/test it.

reisenberger commented 5 years ago

Np. I meant to add (for the general readership): it doesn't have to be me who takes this further, I just wanted to share what I had got. I may not get back to it for a while / may pick at it at intervals, but it should still be considered 'up-for-grabs' for anyone who wants it more urgently.

aprooks commented 5 years ago

@reisenberger I've taken a look at your branch and don't see anything missing besides documentation. Would do you think of creating PR based on your branch so that it would be easier to review and missing parts if any?

reisenberger commented 5 years ago

Thanks @aprooks for the second-read of the code!

I have some unresolved questions about fairness in the high-frequency async case. Referring to the typical two use cases for a RateLimitPolicy:

(a) rate-imposing - just reject executions exceeding the defined rate (b) rate-conforming - queue-and-retry executions exceeding the defined rate, in order to conform to a rate imposed on us

... the RateLimitPolicy currently in that branch implements (a). It was intended that (b) could simply be a wait-and-retry wrapper around it, using existing Polly retry policies to wait for RetryAfter amounts returned by the policy (a).

I've re-discovered that the resolution of await Task.Delay(...) for the async case on Windows platforms is limited to 1/64 second, the often-quoted 15.6ms (and is pretty widely variant for durations around that size). The unresolved question: Say a user configures a policy permitting 100 calls/second. Say, for some period, requests are arriving above that rate and being queued with retries. The limit of await Task.Delay(...) not being able to wait accurately for less than 15.6ms could lead to unfairness (later-arriving requests getting to execute before earlier-arriving requests). If some call-paths get pushed into waits of ~15.6ms when in reality they should wait for much shorter, other execution requests arriving during that wait will steal execution slots.

The concern is not so much of occasional unfairness (which might broadly average out); we don't have to offer a guarantee of absolute fairness. The question in my mind is whether some call-paths in that use case could experience any extreme unfairness: executions which get repeatedly pushed into wait-and-retry loops (potentially for several seconds if they keep being usurped, but at least for significantly longer than expected), while many others queue-jump ahead.

We could just document that such high frequencies are not supported. Or we could do better - for example with some internal, underlying FIFO queue of waiters. At this stage I have a reasonably-promising prototype for the FIFO/fairness approach.

reisenberger commented 5 years ago

@aprooks @Misiu @rahulrai-in @georgiosd / anyone: I opened a draft PR (as requested) at #666 . All review / comments / questions welcome!

SeanFarrow commented 4 years ago

Did this get any further. I'm in a situation where I will need this in the new year, is there anything I can do to help?

reisenberger commented 4 years ago

I have had no bandwidth to look at this in the autumn. I can aim to triage early in the New Year (if not before). Thank you for the offer to help.

SeanFarrow commented 4 years ago

Just wondering what is left to do to make this production-ready? I have a project that means I need this, so could have some bandwidth next week. Thanks, Sean.

reisenberger commented 4 years ago

Hi @SeanFarrow . I haven't been able to get to this this week, but will triage it in early 2020, as promised, if not before. Thanks!

SeanFarrow commented 4 years ago

Hi Dylan,

No Worries, I don’t need this until mid Jan. Thanks, Sean.

reisenberger commented 4 years ago

Triage as previously promised: As mentioned elsewhere, I see rate-limiting as two distinct use-cases:

(a) being the rate-imposing party, for example a called system imposing a rate-limit on those who call (you want simply to reject executions exceeding the rate-limit)

(b) being the rate-conforming party, for example being a caller of a system imposing a rate-limit (rather than reject, you want to queue executions to abide by the rate-limit)

The implementation in the WIP spike is largely production-ready for (a) the rate-imposing use case; the (b) rate-conforming use case needs a new implementation.

For the (a) rate-imposing use case, the implementation currently normalises a configured rate of N tokens per time T, treating it as 1 token per time T/N. I believe this normalisation should be removed.

(My current priorities are in other areas of Polly v8.)

alastairtree commented 3 years ago

(b) being the rate-conforming party, for example being a caller of a system imposing a rate-limit (rather than reject, you want to queue executions to abide by the rate-limit)

These two libraries implement the leaky/token bucket which are useful for implementing the (b) use case or as an alternative package to help folks who come here looking for it:

madelson commented 3 years ago

What is the status of this effort and is there consensus on the behavior? It's a feature we would use and I'd be interested in contributing this (with guidance) if there is interest/need.

Some general thoughts: