Closed Tieske closed 7 years ago
+1!
I think this is a different concept than the one desired with the current implementation The idea behind the current local policy is to have a consistent hashing load balancing in front of your Kong cluster, so that each client is processed by the same Kong node (which stores the current rate limiting data in memory), hence resulting in the highest consistency, lowest latency combo.
With such an implementation, the rate limiting becomes quite significantly less consistent depending on the load balancing policy implemented in front of Kong. It also prevents one from implementing such a policy as described above ^
Thibaultcha, I'm confused as to how you can scale a Kong cluster using that logic. What if we make it an option?
@BrianHutchison If the consistent hashing LB makes sure the same user always ends up on the same Kong node, then there is no need to scale. One can just set the same thresholds as used with the cluster
or redis
policies.
@Tieske Okay, thanks. Based on my reading AWS ELBs do not provide that mechanism. The only options are browser cookie-based stickiness - nothing IP based.
I am sure "local" works for other though -- its a good point, just doesn't help me where I'm at.
I'm a bit new to all those infrastructure concept, but I'm concerned by few points that I maybe misunderstood:
local
policy. If the node dies, all rate information about that client died with kong. That could be really damaging for the weekly / monthly / yearly rate limits.redis
or cluster
, but they want to ensure strong consistency and for that, do many many incrementation requests (at every incoming request? Not sure about what ngx_timer_at
does) to the external database. No middle ground.That's why I'm proposing #2116 .
X-Rate-Limiting-Remaining
header (or whatever its name is) will return non monotonically decreasing numbers. I guess telling the clients that this number is only reliable as an estimation is good enough, and if the clients do enough requests to see the false negative, they just have wrongly sized their requests estimation (and must take a bigger plan, or cache the info themselves), and the fault's on them.What are your thoughts?
I'm a bit new to all those infrastructure concept, but I'm concerned by few points that I maybe misunderstood:
- What if the load per client is really uneven, that some clients do many more requests than others? The load will end up being distributed in a totally unbalanced fashion, right?
I don't get this part. The rate limiting will still be applied per consumer/ip, as with the other policies.
- If we make the choice of performance, we choose the
local
policy. If the node dies, all rate information about that client died with kong. That could be really damaging for the weekly / monthly / yearly rate limits.
Those long running limits are only useful when precise metrics are required, and then local never suffices. For general DOS prevention you'd only use short lived metrics. So in theory you're right, but I don't see the practical use case, or am I missing something?
- If that's something that we can't tolerate, we can choose to use
redis
orcluster
, but they want to ensure strong consistency and for that, do many many incrementation requests (at every incoming request? Not sure about whatngx_timer_at
does) to the external database. No middle ground.
By design they must do one read and one write per request, nothing you can do about it. Not sure what you mean by the ngx_timer_at
remark? But maybe from the update code of the plugin. Without looking at it: a common usecase for the timers is to run code timed at 0 seconds. This means that nginx/openresty will first finish processing the request, and only then finds time to execute the timer. Effect: the write to the database will happen after the last byte of the response was sent, and hence will not delay the request at hand.
- If a node dies, one minute will pass before Serf notifies its death and removes it from the list of alive instances. Then, a bit more time will pass before the LB checks Serf to refresh its list of alive members. That's a total denial of service for all the clients mapped on this node.
Kong will not route traffic to its own nodes. So the LB in front of Kong should have its own healthchecks on the Kong nodes to see where it can route traffic. This all happens outside the scope of the Serf instance used by Kong itself. So I do not see the issue here?
I don't get this part. The rate limiting will still be applied per consumer/ip, as with the other policies.
Sure. Let's say you have 4 Kong instances, and 4 clients (for simplicity). The LB does a nice job (and you're a bit lucky) and does a 1-to-1 mapping between your clients and your instances. Say client0 is doing a lot of requests, and client3 really really few. Then, your kong0 will be much more used than kong3 that will mostly idle. No rate-limiting issue here. It's more related to resources management, and wanting our instances to have roughly the same amount of work.
That's why in some setups (few clients with very different usage profiles), a round robin balancer might be a better scenario, but has the problems I mentioned earlier.
Those long running limits are only useful when precise metrics are required, and then local never suffices. For general DOS prevention you'd only use short lived metrics. So in theory you're right, but I don't see the practical use case, or am I missing something?
In the company I'm working in, some services have very heavy operations. We want to make sure clients do not abuse them, and use the rate limiting on those long periods. Say this Kong instance is serving both such a service, and a more usual service with really quick response time which we want to prevent from DoS.
Here, unless there's something I didn't get, we face a dilemma:
cluster
or redis
policy, the the small service will have its performance largely impacted by the synchronous db call in get_usage
.local
policy and an instance dies, we loose the usage count for the heavy service.By design they must do one read and one write per request, nothing you can do about it.
Unless we choose the tradeoff accuracy for speed, and accept to bulk the increments. There may be other options I haven't thought of.
Not sure what you mean by the
ngx_timer_at remark
? But maybe from the update code of the plugin. Without looking at it: a common usecase for the timers is to run code timed at 0 seconds. This means that nginx/openresty will first finish processing the request, and only then finds time to execute the timer. Effect: the write to the database will happen after the last byte of the response was sent, and hence will not delay the request at hand.
Alright. I did not know if those requests were made after or during the request validation, and if they were packed or not.
Kong will not route traffic to its own nodes. So the LB in front of Kong should have its own healthchecks on the Kong nodes to see where it can route traffic. This all happens outside the scope of the Serf instance used by Kong itself. So I do not see the issue here?
I was more or less assuming that the LB would fetch the IPs of the kong instances from Serf. That may be a totally wrong assumption. As I said, I'm new to all those infrastructure concepts :)
In the company I'm working in, some services have very heavy operations. We want to make sure clients do not abuse them, and use the rate limiting on those long periods. Say this Kong instance is serving both such a service, and a more usual service with really quick response time which we want to prevent from DoS.
Our use cases are fairly similar, as I suspect are others. Part of the role of rate-limiting is sure, to prevent incoming calls from overloading upstream services without enough preparation by the upstream service owners. This would be roughly DOS protection.
The other role for us is that some upstream services have essentially a cost, and that you want to prevent calling it excessively without some sort of balance in remuneration that benefits the business. This cost isn't just allocated infrastructure cost, but those count too. We want to balance and control these calls somewhat between consumers. So, for us, absolute precision is never critical and yet its not just about DOS either.
All of this is just to say, there is plenty of room for requirements that aren't just DOS or accounting level precision.
@Vermeille I see your point.
Not sure how to implement this though. Afaik the current dao abstraction won't let you update multiple counters at once. But applying individual transactions later won't save anything 😕 . Maybe per counter (increment 1x2 instead of 2x1) can be done and might save something.
Well, what we're doing right now, is that we save locally all the increments done for a 30s time period, and every 30s, we increment the counters in the database by the total increments since last push to db / get the usage from the database to resync.
So, well, yeah, every 30s (and we'll make that a parameter, ofc), we make calls to the db that are a bit bigger (and really not that much actually) to resync local estimation with what has been really committed to the database.
Should I try to PR the code we have so far (which is certainly perfect by no means, but still reasonable) so that you guys can have something concrete to think about it?
+1
@Vermeille please do!
https://github.com/Mashape/kong/pull/2169 Here is the associated PR, open for review and criticism :)
closing this in favour of #2169
When using a
local
policy to set a rate limit, it should ideally scale with the number of nodes in the Kong cluster.Such that if you set 100 request/second, and you have 5 nodes, it has 20 per node. If it then drops to 4 nodes, it should become 25 per node.
This should probably have a percent based offset to prevent false-negatives.