Rate-limiting: make the local threshold scale with the number of nodes

Tieske commented 7 years ago

When using a local policy to set a rate limit, it should ideally scale with the number of nodes in the Kong cluster.

Such that if you set 100 request/second, and you have 5 nodes, it has 20 per node. If it then drops to 4 nodes, it should become 25 per node.

This should probably have a percent based offset to prevent false-negatives.

BrianHutchison commented 7 years ago

+1!

thibaultcha commented 7 years ago

I think this is a different concept than the one desired with the current implementation The idea behind the current local policy is to have a consistent hashing load balancing in front of your Kong cluster, so that each client is processed by the same Kong node (which stores the current rate limiting data in memory), hence resulting in the highest consistency, lowest latency combo.

With such an implementation, the rate limiting becomes quite significantly less consistent depending on the load balancing policy implemented in front of Kong. It also prevents one from implementing such a policy as described above ^

BrianHutchison commented 7 years ago

Thibaultcha, I'm confused as to how you can scale a Kong cluster using that logic. What if we make it an option?

Tieske commented 7 years ago

@BrianHutchison If the consistent hashing LB makes sure the same user always ends up on the same Kong node, then there is no need to scale. One can just set the same thresholds as used with the cluster or redis policies.

BrianHutchison commented 7 years ago

@Tieske Okay, thanks. Based on my reading AWS ELBs do not provide that mechanism. The only options are browser cookie-based stickiness - nothing IP based.

I am sure "local" works for other though -- its a good point, just doesn't help me where I'm at.

Vermeille commented 7 years ago

I'm a bit new to all those infrastructure concept, but I'm concerned by few points that I maybe misunderstood:

What if the load per client is really uneven, that some clients do many more requests than others? The load will end up being distributed in a totally unbalanced fashion, right?
If we make the choice of performance, we choose the local policy. If the node dies, all rate information about that client died with kong. That could be really damaging for the weekly / monthly / yearly rate limits.
If that's something that we can't tolerate, we can choose to use redis or cluster, but they want to ensure strong consistency and for that, do many many incrementation requests (at every incoming request? Not sure about what ngx_timer_at does) to the external database. No middle ground.
If a node dies, one minute will pass before Serf notifies its death and removes it from the list of alive instances. Then, a bit more time will pass before the LB checks Serf to refresh its list of alive members. That's a total denial of service for all the clients mapped on this node.

That's why I'm proposing #2116 .

A classic round robin load balancer will ensure an even load on all instances.
If a node dies, little info is lost.
It's a good middle ground between performance and consistency. Though, and that may be a major drawback depending on the app, if one node has its local requests limit reached and the others do not, we will tell the client that their rate limit is exceeded but answer positively to the very next request that will be routed to another instance. Also, the X-Rate-Limiting-Remaining header (or whatever its name is) will return non monotonically decreasing numbers. I guess telling the clients that this number is only reliable as an estimation is good enough, and if the clients do enough requests to see the false negative, they just have wrongly sized their requests estimation (and must take a bigger plan, or cache the info themselves), and the fault's on them.
If a node dies, only (1/N)th of the requests won't be served for that client (statistically) before the LB acknowledges the death of that node.

What are your thoughts?

Tieske commented 7 years ago

I'm a bit new to all those infrastructure concept, but I'm concerned by few points that I maybe misunderstood:

What if the load per client is really uneven, that some clients do many more requests than others? The load will end up being distributed in a totally unbalanced fashion, right?

I don't get this part. The rate limiting will still be applied per consumer/ip, as with the other policies.

If we make the choice of performance, we choose the local policy. If the node dies, all rate information about that client died with kong. That could be really damaging for the weekly / monthly / yearly rate limits.

Those long running limits are only useful when precise metrics are required, and then local never suffices. For general DOS prevention you'd only use short lived metrics. So in theory you're right, but I don't see the practical use case, or am I missing something?

If that's something that we can't tolerate, we can choose to use redis or cluster, but they want to ensure strong consistency and for that, do many many incrementation requests (at every incoming request? Not sure about what ngx_timer_at does) to the external database. No middle ground.

By design they must do one read and one write per request, nothing you can do about it. Not sure what you mean by the ngx_timer_at remark? But maybe from the update code of the plugin. Without looking at it: a common usecase for the timers is to run code timed at 0 seconds. This means that nginx/openresty will first finish processing the request, and only then finds time to execute the timer. Effect: the write to the database will happen after the last byte of the response was sent, and hence will not delay the request at hand.

If a node dies, one minute will pass before Serf notifies its death and removes it from the list of alive instances. Then, a bit more time will pass before the LB checks Serf to refresh its list of alive members. That's a total denial of service for all the clients mapped on this node.

Kong will not route traffic to its own nodes. So the LB in front of Kong should have its own healthchecks on the Kong nodes to see where it can route traffic. This all happens outside the scope of the Serf instance used by Kong itself. So I do not see the issue here?

Vermeille commented 7 years ago

I don't get this part. The rate limiting will still be applied per consumer/ip, as with the other policies.

Sure. Let's say you have 4 Kong instances, and 4 clients (for simplicity). The LB does a nice job (and you're a bit lucky) and does a 1-to-1 mapping between your clients and your instances. Say client0 is doing a lot of requests, and client3 really really few. Then, your kong0 will be much more used than kong3 that will mostly idle. No rate-limiting issue here. It's more related to resources management, and wanting our instances to have roughly the same amount of work.

That's why in some setups (few clients with very different usage profiles), a round robin balancer might be a better scenario, but has the problems I mentioned earlier.

Those long running limits are only useful when precise metrics are required, and then local never suffices. For general DOS prevention you'd only use short lived metrics. So in theory you're right, but I don't see the practical use case, or am I missing something?

In the company I'm working in, some services have very heavy operations. We want to make sure clients do not abuse them, and use the rate limiting on those long periods. Say this Kong instance is serving both such a service, and a more usual service with really quick response time which we want to prevent from DoS.

Here, unless there's something I didn't get, we face a dilemma:

If we choose a cluster or redis policy, the the small service will have its performance largely impacted by the synchronous db call in get_usage.
If we choose the local policy and an instance dies, we loose the usage count for the heavy service.

By design they must do one read and one write per request, nothing you can do about it.

Unless we choose the tradeoff accuracy for speed, and accept to bulk the increments. There may be other options I haven't thought of.

Not sure what you mean by the ngx_timer_at remark? But maybe from the update code of the plugin. Without looking at it: a common usecase for the timers is to run code timed at 0 seconds. This means that nginx/openresty will first finish processing the request, and only then finds time to execute the timer. Effect: the write to the database will happen after the last byte of the response was sent, and hence will not delay the request at hand.

Alright. I did not know if those requests were made after or during the request validation, and if they were packed or not.

Kong will not route traffic to its own nodes. So the LB in front of Kong should have its own healthchecks on the Kong nodes to see where it can route traffic. This all happens outside the scope of the Serf instance used by Kong itself. So I do not see the issue here?

I was more or less assuming that the LB would fetch the IPs of the kong instances from Serf. That may be a totally wrong assumption. As I said, I'm new to all those infrastructure concepts :)

BrianHutchison commented 7 years ago

In the company I'm working in, some services have very heavy operations. We want to make sure clients do not abuse them, and use the rate limiting on those long periods. Say this Kong instance is serving both such a service, and a more usual service with really quick response time which we want to prevent from DoS.

Our use cases are fairly similar, as I suspect are others. Part of the role of rate-limiting is sure, to prevent incoming calls from overloading upstream services without enough preparation by the upstream service owners. This would be roughly DOS protection.

The other role for us is that some upstream services have essentially a cost, and that you want to prevent calling it excessively without some sort of balance in remuneration that benefits the business. This cost isn't just allocated infrastructure cost, but those count too. We want to balance and control these calls somewhat between consumers. So, for us, absolute precision is never critical and yet its not just about DOS either.

All of this is just to say, there is plenty of room for requirements that aren't just DOS or accounting level precision.

Tieske commented 7 years ago

@Vermeille I see your point.

Not sure how to implement this though. Afaik the current dao abstraction won't let you update multiple counters at once. But applying individual transactions later won't save anything 😕 . Maybe per counter (increment 1x2 instead of 2x1) can be done and might save something.

Vermeille commented 7 years ago

Well, what we're doing right now, is that we save locally all the increments done for a 30s time period, and every 30s, we increment the counters in the database by the total increments since last push to db / get the usage from the database to resync.

So, well, yeah, every 30s (and we'll make that a parameter, ofc), we make calls to the db that are a bit bigger (and really not that much actually) to resync local estimation with what has been really committed to the database.

Should I try to PR the code we have so far (which is certainly perfect by no means, but still reasonable) so that you guys can have something concrete to think about it?

depay commented 7 years ago

+1

Tieske commented 7 years ago

@Vermeille please do!

Vermeille commented 7 years ago

https://github.com/Mashape/kong/pull/2169 Here is the associated PR, open for review and criticism :)

Tieske commented 7 years ago

closing this in favour of #2169

Kong / kong

Rate-limiting: make the local threshold scale with the number of nodes #2127