bandwidth-throttle / token-bucket

Implementation of the Token Bucket algorithm in PHP.
Do What The F*ck You Want To Public License
503 stars 77 forks source link

Leaky Bucket vs Token Bucket #17

Open bensebborn opened 6 years ago

bensebborn commented 6 years ago

This project is great, but relies upon bootstrapping the bucket (ie filling with initial token allocation)

This can be troublesome when dealing with multiple resources that need limiting - eg rate limiting an API by remote IP address. There's no simple way of knowing if you've already bootstrapped for that particular IP without calling the storage, which is an extra overhead.

Instead, I'd propose a leaky bucket design - instead of checking the bucket still has >0 tokens remaining, you reverse it. So the bucket is empty initially, then you 'drain' X tokens per X seconds and each new rate limited request adds a token.

If the bucket is 'full' (ie tokens > limit, means you are filling faster than emptying, but with a 'limit' sized buffer) then you fail the request.

This way, you never need to bootstrap - you always just increment the bucket counter.

Thoughts?

malkusch commented 6 years ago

Leaky bucket and Token bucket are replaceable. There's nothing you could do with either one and not the other, is it? Bootstrapping is primarily used for setting up the storage (e.g. database schema). In your example you also have to bootstrap a storage as you want to store the state of the leaky bucket per ip address. Nothing gained.

bensebborn commented 6 years ago

Hi @malkusch

My thoughts:

With the token bucket, we need the key to already exist, which requires the bootstrap.

When using Memcached, eg:

//Check if key exists
 if ($this->memcached->get($this->key) !== false) {
  //create key, fill bucket
}

//Go on to empty token from bucket
$this->memcached->decr($this->key,1)

However using the leaky bucket, as we can start with an empty bucket, we can just jump straight in to:

$this->memcached->incr($this->key,1)

Therefore no extra check is required to bootstrap the bucket. On a busy high traffic site, that's 50% less calls to memcached.

Or am I missing something that you've already worked around?

Being a Rest API we want to protect, it's stateless so I can't see a way of preventing that bootstrap check on every request as there's no 'setup' that's shared?

malkusch commented 6 years ago

Therefore no extra check is required to bootstrap the bucket. On a busy high traffic site, that's 50% less calls to memcached.

Ok, I see your point. I reopen this issue as it is valid improvement. However a tough one, so PRs welcome.

it's stateless so I can't see a way of preventing that bootstrap check on every request as there's no 'setup' that's shared?

You have to bootstrap for every new ip address. You could do this with a registration process, but yes IP addresses change. For now as a workaround, just don't bootstrap at all. There will be an exception and you can recover the exception with bootstrapping. That would affect only new ip addresses while existing ip addresses profit from the performance improvement.

Thinking more about that, I think I don't have to change the implementation to Leaky bucket. The goal is just to remove the needless bootstrap checks for the use case "Rate limiting of many resources (IPs)". I think it could be a drastically easier change if I would not require bootstraping by the user, but just specifying a Bootstrap lambda, so that I could then internally do that only when needed (on exceptions).