Closed davidor closed 6 years ago
If it's acceptable to reset rate limits when the gateway restarts, we can add timestamp or a random string to the key in the initialization.
@y-tabata I think it will depend a lot on the use case.
For limits with small periods (seconds, minutes) it might be acceptable. However, 3scale supports big periods for limits (like a month, or a year). In that case, resetting the counters and the limiters is not acceptable.
Regarding connections
and leaky bucket
, it's no problem to reset rate limits because when the gateway restarts, there is no connection.
So we need to focus on fixed window
only.
There are 5 cases on each parameter as illustrated in the following example.
window:
old window | current | new window | expected window | reset? |
---|---|---|---|---|
10 | 5 | 3 | 3 | yes |
10 | 5 | 5 | 5 | yes |
10 | 5 | 7 | 7 | yes |
10 | 5 | 10 | 5 | no |
10 | 5 | 15 | 15 | yes |
count:
old count | current | new count | expected count | reset? |
---|---|---|---|---|
10 | 5 | 3 | 3 | yes |
10 | 5 | 5 | 5 | yes |
10 | 5 | 7 | 7 | yes |
10 | 5 | 10 | 5 | no |
10 | 5 | 15 | 15 | yes |
Regarding window
, if the new window
is shorter than the current window
, we have to reset the rate limit.
If the new window
is longer than the current window
and the window
value is changed, it is thought that users restarted the gateway in order to reset the rate limit. It is strange that the window
in the real environment is still following to the current window
although users change the window
value. So we have to reset the rate limit.
But if the new window
equals the old window
, it is not thought that users restarted the gateway in order to reset the rate limit. So we don't reset the rate limit.
The same is true of count
.
So we store the count
and window
in Redis, and if the new window
equals the old window
and the new count
equals the old count
, then we don't reset rate limits. Otherwise, we reset rate limits.
@y-tabata you are thinking that the key is going to be known without the request.
That is not going to be the case all the time. We plan to add templating to the key, so you can for example rate limit by {{ ngx.var.remote_addr }}
to use client IP address. That value would be interpolated in the rewrite phase - the key would not be known until the evaluation.
I think it is first important to define what we want - business logic wise - and then proceed to the technical solution.
https://github.com/openresty/lua-resty-limit-traffic/blob/ef080731b80fe00a385a4aed8a7f7f575d01c152/lib/resty/limit/count.lua#L47 uses decrements to maintain the connection limit. That means on change of the limit the current window would first have to expire.
I understand this might be confusing and probably counter effective. Example scenario: the limit is set too high for too long window and due to an attack it needs to be lowered - need to expire the window would be pretty limiting factor preventing quick reaction to an attack.
Proposal: use increments instead of decrements to maintain current connection count and apply the limit after that.
This limiter is configured with per second rate limit and burst. I think it is ok to ignore this as its super low granularity would cause it to reload instantly.
https://github.com/openresty/lua-resty-limit-traffic/blob/ef080731b80fe00a385a4aed8a7f7f575d01c152/lib/resty/limit/count.lua#L47 - it is implemented the same as connection rate limiter. Decrements from the limit default value.
Switching to maintaining current value and let the limit outside would solve the problem.
I think we should investigate cost of changing to increment model and see if we can push those changes upstream. If not, we could simply fork the library.
Also, there is already some discussion about changing limits on the fly: https://github.com/openresty/lua-resty-limit-traffic/issues/23
@mikz Thank you for your advice.
so you can for example rate limit by
{{ ngx.var.remote_addr }}
to use client IP address.
If we use client IP address, the scope will be client. It's good templating when we will implement the client
scope in this policy in the future, but considering the service
or the global
scope, we'd like to use other templating.
"connection"
https://github.com/openresty/lua-resty-limit-traffic/blob/ef080731b80fe00a385a4aed8a7f7f575d01c152/lib/resty/limit/count.lua#L47 uses decrements to maintain the connection limit. That means on change of the limit the current window would first have to expire.
The link is "fixed window"'s link. I think "connection" is the increment model.
I think we should investigate cost of changing to increment model and see if we can push those changes upstream. If not, we could simply fork the library.
Also, there is already some discussion about changing limits on the fly: openresty/lua-resty-limit-traffic#23
In my understanding, if a user changes parameters of a policy (from UI), apicast restarts and the init
and the init_worker
phases are executed. Is my understanding correct? If so, we don't need to consider changing limits on the fly.
One thing we have to consider is when a user changes parameters of the "fixed window" limiter, we should reset the remaining
or not. #703 resets it but you think it's not acceptable to reset it, don't you?
The issue (openresty/lua-resty-limit-traffic#23) looks not active.
How about using the library, leaving it as decrement model?
For example, when the count
is changed 10 to 20, we increment the (remaining) value by 10. When the window
is changed 3000 to 2000, we decrement the expiration by 1000.
@y-tabata
If we use client IP address, the scope will be client. It's good templating when we will implement the client scope in this policy in the future, but considering the service or the global scope, we'd like to use other templating.
Client IP is just an example. With templating the value can be any parameter from the request. That might include for example parsed JWT token. We don't want to extend the policy to add scope for each kind people might want to rate limit by.
Service and global scope still make sense. For example in our SaaS we won't allow people using global scope. So even if they will have templated rate limiting key {{ ngx.var.remote_addr }}
because of the service scope it would get prefixed with service id into something like service_id:XX/{{ ngx.var.remote_addr }}
. And now imagine scenario when you want to combine those into for example rate limiting path per client ip: {{ ngx.var.remote_addr }}/{{ ngx.var.uri }}
. The nginx examples work on the same principal. The rate limiting key is known only when the request arrives because it is using variables like $remote_addr
.
We want to allow users to rate limit by anything available in the context (any nginx variable, parsed JWT, etc.) without us doing the work.
The link is "fixed window"'s link. I think "connection" is the increment model.
You are right. Sorry for the confusion. Indeed the "connection" is incrementing: https://github.com/openresty/lua-resty-limit-traffic/blob/ef080731b80fe00a385a4aed8a7f7f575d01c152/lib/resty/limit/conn.lua#L53
In my understanding, if a user changes parameters of a policy (from UI), apicast restarts and the init and the init_worker phases are executed. Is my understanding correct? If so, we don't need to consider changing limits on the fly.
Unfortunately no. init
and init_worker
are executed only when APIcast starts/restarts. When it reloads configuration they are not executed (because it does not restart).
We also have to consider cases where there is several gateways running with different configurations (either rolling updates or just because of restarts or pulling config at different times).
One thing we have to consider is when a user changes parameters of the "fixed window" limiter, we should reset the remaining or not. #703 resets it but you think it's not acceptable to reset it, don't you?
So if rate limit is changed from 50 to 100 it does not make sense that the app could do more than 100 when that happens. If we would keep only "remaining" and reset it this could happen, right?
I think the only sane way of making changes/upgrades work is to track the current usage and just apply the limit independently instead of tracking the remaining. One of the nice benefits would be that you can have two gateways with different limits using the same key. Maybe for internal/external network.
The issue (openresty/lua-resty-limit-traffic#23) looks not active. It does not, but still has some useful information. We can try to revive it by saying we would like to change the model to increment so it can accept changes to the limit and see if maintainers are ok with it.
How about using the library, leaving it as decrement model? For example, when the count is changed 10 to 20, we increment the (remaining) value by 10. When the window is changed 3000 to 2000, we decrement the expiration by 1000.
I think that is hard problem. For example if there is 10 gateways and they start pulling configuration at different times, how would you know that the change was applied or not ? What about just gateway restarts?
Keeping state is sync is hard problem. And from my experience it is easier to flip the problem on its head and try to eliminate it rather than fight it.
I'll try to revive openresty/lua-resty-limit-traffic#23 and see if they'd be ok with switching to increment model. IMHO that is the best path forward and would unify how the rate limiters are implemented.
@mikz
And now imagine scenario when you want to combine those into for example rate limiting path per client ip: {{ ngx.var.remote_addr }}/{{ ngx.var.uri }}. The nginx examples work on the same principal. The rate limiting key is known only when the request arrives because it is using variables like $remote_addr.
I understand. So users can rate limit by "client" or "method" and so on.
In order to do that, we have to add the rewrite
method to interpolate the templatings and add the list of templatings to the json schema (key
{ name
}) or define some rule to use templatings, for example, put the templating string in double brackets. {{ ngx.var.remote_addr }}
Unfortunately no. init and init_worker are executed only when APIcast starts/restarts. When it reloads configuration they are not executed (because it does not restart).
I understand. So in the access
method, we have to check whether configuration was changed or not, and call on the fly method like set_conn
.
Regarding connection
, set_delay
is insufficient.
Regarding fixed window
, there are no on the fly methods.
We should add these methods to lua-resty-limit.
One of the nice benefits would be that you can have two gateways with different limits using the same key. Maybe for internal/external network.
This is very nice benefit, I think.
I'll try to revive openresty/lua-resty-limit-traffic#23 and see if they'd be ok with switching to increment model. IMHO that is the best path forward and would unify how the rate limiters are implemented.
Thanks for your help.
@y-tabata
I understand. So users can rate limit by "client" or "method" and so on. In order to do that, we have to add the rewrite method to interpolate the templatings and add the list of templatings to the json schema (key { name }) or define some rule to use templatings, for example, put the templating string in double brackets. {{ ngx.var.remote_addr }}
Yes, the template expansion would be performed in the rewrite phase. And as for how to template, the plan is to use liquid - the same format 3scale uses on developer portal and for nginx configuration of APIcast. I plan to work on this and have a prototype by the end of this week.
I understand. So in the access method, we have to check whether configuration was changed or not, and call on the fly method like set_conn. Regarding connection, set_delay is insufficient. Regarding fixed window, there are no on the fly methods. We should add these methods to lua-resty-limit.
I think we might not be needing that. If both connection and fixed window rate limiters would use increments, then the only other variables are the limit and the time window.
I think we should be encoding the time window to the key itself. So for example for UNIX timestamp 1526378712
and time window 3600
we would round it to 1526378400
and use it as part of the key: 1526378400_{{ngx.var.remote_addr}}
. Then changing the time window would also change the key and we would not have to worry about changing expiration. This could be done by another interpolation or by some flag in the configuration, or by scope.
I'd really try to stick just using increments, so we don't have to solve the hard problem of synchronizing changes across the fleet of APIcast instances.
I made a simple patch https://github.com/openresty/lua-resty-limit-traffic/pull/34 to change it to increments. I'd appreciate some review scrutiny especially regarding atomicity and race conditions.
https://github.com/3scale/apicast/pull/758 updates APIcast to use that forked version.
Can we close this by #703?
The rate limit policy has some limitations when configured to use Redis.
For example:
fixed window
limit for a key, with a givencount
andwindow
.window
.window
.or
In order to solve this problems, we need to store more information in Redis and perform some checks every time Apicast boots with the rate limit policy config. For example, the policy needs to check the existing limits and see if it needs to increase the TTL of the limiting keys, decrease it, remove them, etc.