redis's reference implementation "Sliding Window Rate Limiting"

Hi, this is a question.

The redis official website has a reference implementation Sliding Window Rate Limiting to showcase the use case of sorted set

After comparing the their Sliding Window Rate Limiting and your Leaky Bucket Algorithm implementation I feel they are not much difference although they use different metaphors. The main difference is Leaky Bucket leaks out at a constant rate while the sliding window does not slide at a constant rate but only slide when there is a new request and the time gap is larger than the window size. I feel sliding window seems to have more flexibility in managing bursts of traffic.

And the redis's implementation is rather simple, basically 3 commands for sorted set, zremrangebyscore, zcard, zadd

Do you consider add sliding window to your rate limiter ?

The three main types of rate limiter algorithms are Fixed Window (Laravel's default), Sliding Window (what Redis uses), and Leaky Bucket which is what this package implements.

Fixed window allows for up to the maximum requests within the time window before triggering rate limiting. That leads to stuffing the requests at the beginning and end of the time windows which can effectively lead to unwanted bursting or doubling of the desired max requests.

The sliding window overcomes this by making the window of additional requests slide with the requests so instead of at a fixed time interval, the time interval starts with the first request. That guarantees that the maximum number of requests cannot be manipulated within the window but it does't force requests to an arbitrary rate. There's some other algorithm spin offs of sliding window that also shrink/expand the window based on the number of requests already within the window to arrest bursting by forcing a constant rate for the remaining requests.

Leaky bucket doesn't track time but rather requests. It has a leaks per second. The requests are leaked out of a total number of requests allowed within a time frame from the first request. In that sense there's a sliding time window but it's not about the time as the bucket leaks at a preconfigured constant rate. When excessive requests are made that would exceed the maximum within the bucket then it overflows. You configure your bucket size as maximum requests but you configure your constant rate of requests to allow new request to be made without overflowing so long as they are made at the constant rate that is less than or equal to the leak rate. For good actors that can lead to starvation as not enough requests are available but for bad actors it creates a rate limiting effect. Ultimately it forces traffic to normalize around the rate, while allowing for a bit of network variance and bursting based on bucket size. Apps that idle and then burst, will be able to get through but still not more than the absolute number of requests allocated to the bucket for the timeframe.

I have considered adding the sliding window as an alternative to the leaky bucket algorithm. I welcome PRs. Most rate limiting on API subscriptions is based on subscription tiers which are metered in requests per minutes, hour, day, or month. The leaky bucket is easy to configure for this practical use case which is why I haven't implemented the sliding window.

The problem I have with leaky bucket is that it leak out a constant rate, it is like sliding window at a constant rate. But in my real case scenario, my client does not send the request at a constant rate but rather in a bust at some specific time frame and normally at other time. So I feel leaky bucket can't handle that very well.

Are you wanting to prevent the bursts and only allow for the normal baseline? The leaky bucket allows for bursts up to the maximum while then allowing for the constant rate (the baseline rate) of additional requests thereafter. Any extra idleness would be perceived by the client as additional burst capacity but still keeping within the total allowed.

Real world example is you want to limit to a baseline of 300 requests per minute which is the same as 5 requests per second. If you set the rate as 5 r/s with a maximum bucket of 300 then they can indeed dump 300 requests all at once and then make 1 additional request every 200ms. If they sit idle for 1 second then they can request 5 more without risking rate limiting but if they make 6 requests when there's already 295 requests in the bucket, they'll get a timeout penalty of say 10 minutes. In 10 minutes all the requests would be available again because the bucket would drain but the penalty is imposed so no new requests. What's more you could configure it to increase the penalty for every new request attempted still. They must be completely idle for 10 minutes otherwise the timeout starts over.

Now take the same example of 300 requests per minute as 5 request per second but you want to ensure at no time the client can make more than 5 simultaneous requests. You'd set the bucket limit to 5. So they can make 5 requests in one second, but will have to wait 200ms between subsequent requests to get another request without tripping the bucket. Or if they sit idle for 1 whole second they'll get all 5 requests back so they can do small bursts of 5 requests every 1 second or 1 request every 200ms at a constant rate. If they sit idle for 2 minutes, they don't get 600 requests capacity, they still are limited to only 5 requests maximum per second at the constant rate of 1 per 200ms.

I've got nothing against the sliding window algorithm as it does have different customization options for various use cases so if you're interested in implementing the algorithm, you can submit a PR and I'll help get it released into this package as an alternative algorithm complete with the eventing and other mechanisms of this package. Most integrators find the above use case and configuration more than sufficient for their purposes.

I will try to let my guy submit a PR as php is not my strong suit, just my team uses php currently.

Hi one more question to make sure we are absolutely on the same page. Let's say I want 300 requests per minute. That is the only requirement I have so it does not translate to 5 requests per second. I want to prevent this particular scenario that my client sends the 1st request at the 1st second, idle for the next 58 seconds, then 299 requests at the 59th second, so far it is allowed then in the 60th second if the client sends more than 1 second then he violates the 300 requests per minute requirement because we count the from 59th second to the next 119th second and within this one minute time period he sends more than 300 requests. So if he sends 299 requests in 59th second, he can only 1 more request until 120th second.

If I use leaky bucket can I achieve this 300 requests per minute ?

In short, no.

If you want to limit the maximum number of requests within a certain time frame, you'd use the sliding window algorithm. For a more consistent request rate, the leaky bucket algorithm is preferable. The choice between them depends on configuring for maximum load or load balancing. Based on your use case, if 300 maximum within any 60 second window is your intent then go with the sliding window algorithm.

Sliding Window

The sliding window is effectively, How many requests have been made in the last 60 seconds? That's the usage. You take the maximum minus the usage to get the remaining balance. You can at any time burst up to that remaining balance at any given time.

At the first request the T-60s is 0 requests so a balance of 300 and afterwards it's a 299 balance. After 60s from the first request, you'd be at time T. In the sliding window look back period of T-60 you'd have executed 300 requests so you have a balance of 0. If you idle another 10 seconds such that you're now at T+10, which has a look back period of T+10-60 = T-50 within the original window. Therefore you have 299 requests and balance of 1. You'll have to wait till T+60 of the original time window to get back to a balance of 300.

So for any given 60 second window you'd never have more than 300 requests made by that client. You are therefore still subject to "bursty" behavior which can be too much load for the service. You could set 150 requests every 30 seconds or 50 requests every 10 seconds or 5 requests every second to coerce bursting behavior to normalize to the preferred rate but you sacrifice the peak rate of 300 instantaneous requests. The sliding window is designed to guarantee a maximum number of requests in any given time window, not to maintain a constant rate of requests and any attempt to maintain constant rate diminishes your bursting capacity overall.

Leaky Bucket

With the leaky bucket algorithm, the client makes 1 request at T-60, then 299 at T-1. While the client has made 300 requests in 60 seconds, their remaining balance of requests at T is 300 - 299 + 5 = 6 requests and not 0 requests. With a constant leak rate configured for the equivalence of 5 requests per second (300 requests / 60 seconds = 5 r/s) the first request at T-60 is leaked before T-59 has even happened.

The bucket can't over leak so while the client is idle, the bucket remains at 300 remaining requests until T-1 when those 299 requests are made. At that moment you still have balance of 300 - 299 = 1 request. Then the next second you have another 5 requests leaked so you have 6 requests remaining. If you dumped all 300 requests at T-60 then at T-1 you'd be able to dump another 295 because the bucket would have leaked them all out.

Therefore in that 60 second window you'd have 595 requests possible. At T you'd have 5 request balance remaining still so in 61 seconds you could make up to 600 requests. At T+60 you could make another 300 so that's 900 requests in 120s which is 7.5 r/s window average. That's the peak rate tolerated by the bursting allowance of the algorithm. You'll note that that's 150% of the original 5r/s rate.

If you set the maximum requests bursting to 150 while maintaining the 5r/s rate you'd only be able to dump 150 at T-60 then have to wait 150 / 5 = 30s before being able to dump another 150 requests at T-30. Wait again until T to dump another 150 which is then 450 requests in 60 seconds which is still 7.5r/s maximum rate within the window but a reduced burst capacity.

Lower the burst capacity to 50 maximum means you can dump 50 every 10 seconds (based on the 5 r/s rate) so T-60, T-50, ... T-10, T = 350 requests in 60 seconds which squeezes the averaged rate closer to 5.8 r/s which is only 116% of the drip rate. So the leaky bucket allows over the burstiness tolerated by the algorithm independently of the normal request rate. That's only possible because it's not concerned with maximum requests and instead only rate limiting sort of like a load balancer. The coupling of burstiness and request rate within window algorithms is their main distinction with leaky bucket which decouples the two parameters for finer control.

Normalizing Client Request Rate

The leak bucket algorithm is designed to slow down the client to preferred request rate while tolerating their burstiness. You typically look at your request volume and usage rate or maximum load rate of your service or application. You then set your leak rate to +1 stdev of the mean (average) rate, which means 84% of all client requests can/should be limited to that rate. For example say the service can handle up to 30K requests per minute which is 500 requests per second. You'd set your requests per second rate to 84% of that or 420 requests per second which if you had a 100 max clients would be 4.2 r/s per client.

Then you should set your maximum peak rate as measured based on +2 stdev (97%) of the mean rate. That would be 485 requests per second or 4.85 r/s per client. That allows for clients to request at your preferred constant rate while letting them go as much as 1 stdev above the set rate which is 2 stdev above the mean rate of 250 r/s. If any client exceeds their 4.85 r/s rate averaged over the minute they should be considered a deviation from the normal (anomalous) and need to be put into timeout aka rate limited.

Notice how normalizing is done by choosing a good requests per second leak rate and making sure it's good enough for all 100 clients to request at that constant rate without overloading the service. Then you work from the 4.85 r/s peak rate to get to a maximum number of requests allowed in any burst capacity. A normal rate of 4.2 r/s = 252 r/m and 4.85 / 4.2 = 1.15 (115% of normal rate) which corresponds to your optimal number of burst requests (i.e.: rounded to 5 requests max). If all clients made 5 requests every second that'd be 500 requests per second which is you maximum load.

You usually don't have that many greedy clients so if 80% of your clients stayed below 180 r/m requests that's a rate of 3 r/s for 80 clients which is only 240 requests total out of your 500 r/s capacity. So you take the 260 requests of remaining capacity and divide by the 20 greedier clients to get a rate of 13 r/s. You could therefore easily double burst capacity from 5 for everyone to 10 for everyone and maybe even get away with 15 if those 20 greedy clients aren't all equally as greedy. You could set burst capacity to any arbitrary amount you want to tolerate at any given time but you have to be careful that it's not going to get abused by the aggregate of your clients such that you're over service capacity. Remember, the leaky bucket allows for as much as 2x the bursting maximum number of requests within the implied time window set by the rate.

Dynamic Rates

With this package you can actually program the rates to be dynamically recalibrated every 5 or 10 minutes based on the stdev of current load or average usage. That means that if there are fewer clients, everyone gets to use the service more, and if there's more clients, everyone is throttled to stay within normals while banning the clients that are being resource hogs. Leaky buckets are used by many automated systems where it's purpose is to normalize and distribute load because of fixed capacity and variable usage.

It's great for service flood control and prevention which is more of a system wide concern and not a specific client rate limit. Good rates set for clients will not tolerate a singular client crashing your system but in coordination/aggregate it can lead to a flood which is where the leak rate of the bucket needs to be dynamically set while bursting is reduced to allow the flood to subside by normalizing the inflow.

Thanks for the detailed explanation. The key takeaway (for me) are,

The leaky bucket allows for as much as 2x the bursting maximum number of requests within the implied time window set by the rate.
The leak bucket algorithm is designed to slow down the client to preferred request rate while tolerating their burstiness. The coupling of burstiness and request rate within [sliding] window algorithms is their main distinction with leaky bucket which decouples the two parameters for finer control. (as sliding window does not care request rate)

BTW, I asked ChatGPT 4, gemini, claude v3 about their difference. None of them get the point (some anecdote to my question)

Gemini

GPT 4

Claude

Cheers

Claud did best at explaining Leaky Bucket's benefit "maintaining an overall steady request rate".

Note as well, Leaky Buckets are often deployed for their actual leak rate steadiness which GPT explained, as in the case of a queued job. The bucket fills with queued jobs up to the maximum allowed by the bucket, then each job is leaked to workers at the fixed rate. This absorption of input flow with a steady output flow is the flood control I was talking about. Most people use this package though for the improvement over the fixed window algorithm used by Laravel.

I can see your point that sliding window might be a preferred use case for those looking to improve fixed window. I'll prioritize it as maybe a new release.

artisansdk / ratelimiter