Closed nicholaspcr closed 2 years ago
What was initially thought to be a bug was is in fact the expected behaviour of the GCRA algorithm that is implemented in the throttled
package, meaning that our documentation gives information that leads to a wrong interpretation on how the limitation works.
When doing something like:
rate-limiting:
profiles:
- name: gRPC API
max-per-min: 30
max-burst: 10
associations:
- grpc:method
- grpc:stream:accept
The max-per-min
sets the rate in which requests can be processed by the service. While the max-burst
defines a leeway besides the set amount of requests per second that one service can execute, the leeway value is present at the x-rate-limit-limit
header.
The situation described on the issue suffices as an example, imagine a gRPC limitation with max-per-min
equal to three and with a max-burst
not specified, meaning that it receives the value of max-per-min
. That would mean that gRPC can handle 1 request per 60/3
seconds and starts with the current leeway of 3
requests, with this in mind the following case might facilitate the visualization of how it works.
Two requests are instantly made, eliminating part of the available request set by the max-burst
. The x-rate-limit-available
will now be at 1 and the x-rate-limit-reset
will provide a value of 40, meaning that it will take 40 seconds to replenish the leeway. After waiting 20 seconds the x-rate-limit-available
amount of request will increase by one, that only happens if no request is processed in that time.
So in theory the maximum amount of requests that could be handled in one minute is max-burst
+ max-per-min
, which is consistent with the behaviour found in the logs.
For a more detailed information of the behaviour the following links are good resources:
In short the description of the x-rate-limit
found here might lead to a wrong interpretation on how the ratelimit
works. After reflecting on the issue the solution seems to just be a bit more elaboration on the behaviour of the rate-limit in the documentation. But wanted to know what is your opinion on the subject @KrishnaIyer and @adriansmares.
To make it more explicit what are the values of the headers:
X-Rate-Limit-Limit
: The maximum number of bursts allowed.X-Rate-Limit-Available
: Number of available burst requests for the current time period.X-Rate-Limit-Reset
: Seconds until the available amount of burst requests reaches the limit.X-Rate-Limit-Retry
: Seconds until the rate limiter accepts another request, only contains a value when available
is a zero.So x-rate-limit-reset
is the amount of time you have to wait until the available resets to the original max-per-min
value i.e., 3 in your example?
Also, what about x-rate-limit-retry
? That's not used then?
So in theory the maximum amount of requests that could be handled in one minute is max-burst + max-per-min, which is consistent with the behaviour found in the logs.
That doesn't sound right. From what I understand, the burst should be included within the max-per-min
and if someone makes a burst request, it must be deducted from the available value for the current window.
Can you run some examples with request timestamps?
I was trying to avoid use of the word resets
due to behaviour of the algorithm, there isn't a hard reset of requests allowed per minute, it's a slow refill of the available amount of requests that can be made. So If one user sleeps for the amount of seconds specified in x-rate-limit-reset
not necessarily means that he will be able to do a request, as long as there is an user making requests at the same pace as the rate of recovery the first user will not be able to make a request.
The x-rate-limit-retry
only contains a value when x-rate-limit-available
is 0
, the value of the x-rate-limit-retry
represents the amount of time one have to wait until x-rate-limit-available
goes from 0
to 1
(when the next request will be permitted).
So in theory the maximum amount of requests that could be handled in one minute is max-burst + max-per-min, which is consistent with the behaviour found in the logs.
That doesn't sound right. From what I understand, the burst should be included within the max-per-min and if someone makes a burst request, it must be deducted from the available value for the current window.
From the comments in the throttled's code the MaxBurst
defines the amount of request that will be allow to exceed the amount established by the rate in a single moment, it doesn't infer that the rate in which the request will be available will be affected as a result.
#!/bin/bash
for run in {1..8}; do
go run ./cmd/ttn-lw-cli g get gtw1
done
for run in {1..6}; do
go run ./cmd/ttn-lw-cli g get gtw1
sleep 12 # recovery rate -> 60s / 5req-per-min
done
tls:
source: file
root-ca: ./ca.pem
certificate: ./cert.pem
key: ./key.pem
rate-limiting:
- name: gRPC API
max-per-min: 5
max-burst: 7
associations:
- grpc:method
- grpc:stream:accept
From the script it is established that the first 9 request will be done without any timeout between them, meaning that it goes beyond the MaxBurst
of 7 specified in the stack config. One would expect the 8th request to be the one to fail but in the logs the 9th request is the one that presents the error, when observing the time difference between the requests it's easier to see why.
2022-01-13 09:57:19.900649828
to 2022-01-13 09:57:32.474718184
max-per-min
is 5, creating a request rate of 12 seconds.The logs follow this logic since the 8th request doesn't fail, while the 9th does since its done without any timeout. The following request don't fail due to the timeout of 12 seconds, meaning that the server always make another request available within that time span.
In short, between 2022-01-13 09:57:19.900649828
and 2022-01-13 09:57:47.708527296
there are 10 requests being made.
So I read a little bit here and you're partially correct.
Here's what happens (I've verified this with testing)
max-per-min
is 5, then 5/60 = 12s. A new request can only be made once every 12 seconds.MaxPerMin
. So if max-burst
is 7, you can make 7 requests continuously within the first 12s without any gap. But after you make a burst of requests and exhaust your burst limit, you have to
a. Wait for X-Rate-Limit-Retry
until you can make the next single request.
b. Wait for X-Rate-Limit-Reset
until you can make the next max burst request. (i.e., make 7 consecutive requests).So effectively, the max no of requests per minute is
max-per-min + max-burst - 1
(because one of the requests is part of the burst).
I think we make a mistake in our code. I don't know why we set the default burst to max-per-min -1
; https://github.com/TheThingsIndustries/lorawan-stack/blob/fd2aba958be1625bec5e661af5c1d8fb416872a5/pkg/ratelimit/config.go#L93
Because now, if the max-per-min
is 60 and we leave the max-burst
empty, we allow upto 60 requests in 1 shot and then one request each second for the next 59 seconds. However, the client must then wait (do nothing) for the next 60s before it can send the next burst of 60.
So the default max-burst
must be 1
which effectively disables bursting and we need to selectively allow bursting for some RPCs.
Also btw, it's theoretically an invalid setting to have max-burst
> max-per-min
since that violates the max constraint.
What do you think @adriansmares ?
The throttled
godoc explains the meaning of the headers, as we copy those directly:
https://github.com/throttled/throttled/blob/ef1aa857b069ed60f6f859f8b16350e5b7c8ec96/rate.go#L29-L58
I think we make a mistake in our code. I don't know why we set the default burst to
max-per-min -1
; https://github.com/TheThingsIndustries/lorawan-stack/blob/fd2aba958be1625bec5e661af5c1d8fb416872a5/pkg/ratelimit/config.go#L93
Because lacking a burst value results in a terrible user experience. Outside of the APIs which are on the hot path (i.e. AS webhook downlinks), you don't really use them at a steady state - every once in a while, you login into the Console, do 10 different API calls in parallel, finish looking and leave. Most relevant workloads are actually just bursts - imagine having to wait 5 seconds while switching between pages in the Console end device view.
Because now, if the
max-per-min
is 60 and we leave themax-burst
empty, we allow upto 60 requests in 1 shot and then one request each second for the next 59 seconds. However, the client must then wait (do nothing) for the next 60s before it can send the next burst of 60.
But why is this a bad thing ? I don't follow why after spending the burst, respecting the rate is wrong.
So the default
max-burst
must be1
which effectively disables bursting and we need to selectively allow bursting for some RPCs.
I would inverse this - lower the burstable amount per period for the APIs which require it. Not having bursting by default would be terrible, as you would need to wait for the period between each call, even if you just need a quick answer and that's it.
I don't follow why after spending the burst, respecting the rate is wrong.
Because the effective rate is max-per-min + max-burst - 1
which defeats calling the first parameter max-per-min
. That parameter should've been called req-per-min
but we're beyond that point.
I would inverse this - lower the burstable amount per period for the APIs which require it
I second this.
Not having bursting by default would be terrible, as you would need to wait for the period between each call,
That's not true in all cases. For example, with gateway claiming, we allow 20 claims/min. I don't see why one needs to make 20 requests in one shot and then be rate-limited. I'd rather spread out these calls and have clients have a delay between the calls for batch operations. Especially since the Semtech RJS does not handle floods very well.
I think we both agree that some APIs should allow bursting. We just have a different idea for the default case. I think bursting should be allowed conservatively and your idea is that we allow bursting by default and clamp where necessary. I think they're both equally valid alternatives, so I'm fine with keeping the burst behaviour we have now.
The throttled godoc explains the meaning of the headers, as we copy those directly: https://github.com/throttled/throttled/blob/ef1aa857b069ed60f6f859f8b16350e5b7c8ec96/rate.go#L29-L58
The description of the throttled
comments and our documentation are quite similar, the cause of my confusion was that what I was expecting the headers to be didn't exactly match the max-per-min
configuration field, but it is as @KrishnaIyer mentioned:
Because the effective rate is max-per-min + max-burst - 1 which defeats calling the first parameter max-per-min. That parameter should've been called req-per-min but we're beyond that point.
The two ideas for the default case are sensible but I'm personally more inclined to leaving the current burst behaviour, since changing it to a more conservative burst policy might lead to a unpleasant surprise to users that haven't specified the burst amount but might be using it for bursts that slightly go over the request rate.
@adriansmares and @KrishnaIyer, with all of this in mind I'm wondering if there is in fact something that should be changed. From my perspective the header documentation does give enough information. I don't think things like the algorithm used for the rate limiting are necessary for the documentation, since its objective is to clarify the usage of the tts.
Because the effective rate is
max-per-min + max-burst - 1
which defeats calling the first parametermax-per-min
. That parameter should've been calledreq-per-min
but we're beyond that point.
I see it now. Correct.
@adriansmares and @KrishnaIyer, with all of this in mind I'm wondering if there is in fact something that should be changed. From my perspective the header documentation does give enough information. I don't think things like the algorithm used for the rate limiting are necessary for the documentation, since its objective is to clarify the usage of the tts.
I indeed think that the only thing that should be done here is to pay a short lip service to the GCRA algorithm (i.e. a link) in the docs, but otherwise indeed no bug here.
As part of our operations we may selectively turn off bursting for certain calls - we have https://github.com/TheThingsIndustries/lorawan-stack-aws/issues/544 for that.
Summary
The values of the
rate limiting
headers are not compliant with the documentation available. The behaviours observed were:x-rate-limit-reset
value increases through time instead of being a decreasing value.x-rate-limit-available
header in different requests within the same minute.Steps to Reproduce
log.FromContext(ctx).Debug("Result: ", result)
would suffice.rate-limiting
enabled.go run ./cmd/ttn-lw-cli g get <gateway_id>
with intervals of 10 seconds between each request. Script example below.stack config - enables rate-limiting
Template script to run go run command multiple times
Example of logging
rate-limiting
values in interceptorWhat do you see now?
In the
cli
logs you can see that thex-rate-limit-available
header has the same value twice within the same minute and that thex-rate-limit-reset
increases. The reduced version of the log only shows the value of the headers.The server logs contains the values of the
result
andlimit
variables that can be found in: https://github.com/TheThingsNetwork/lorawan-stack/blob/7a908e9f732cab737c36980fcde240e5cda7a942/pkg/ratelimit/grpc.go#L77What do you want to see instead?
A
x-rate-limit-reset
value that represents how many seconds are left until the reset happens and ax-rate-limit-available
that is unique to each request.Environment
How do you propose to implement this?
Investigate what is causing the discrepancies between the header values and the server's
limiter
values. Checking the expected behaviours within the library used for thelimiter
and/or confirm the places where the values are manipulated to see if there is anything that could be altering them.How do you propose to test this?
Manually test it. Since there are already unit tests for adding the
limiter
to the header it might be worth it to do something more end to end, not sure if this should be within the scope of this bug or not.Can you do this yourself and submit a Pull Request?
Yes