TykTechnologies / tyk

Tyk Open Source API Gateway written in Go, supporting REST, GraphQL, TCP and gRPC protocols
Other
9.58k stars 1.08k forks source link

Concurrency issues in rate limiting and quota support #40

Closed camann9 closed 9 years ago

camann9 commented 9 years ago

The rate limiting and quota data is integrated in the session (key) objects. This leads to various problems, see the following scenarios.

Scenario A, two tyk nodes:

Scenario B, tyk node and tky analytics:

These scenarios are quite likely to happen in high-load environments and apply to both quotas and rate limits (since they are essentially the same thing). A fix would be to remove the rate limit information from the session object and put it into a quickly expiring redis key (or two keys because we basically have two rates, one for short term and one for long term).

See the "Pattern" part in the redis documentation for INCR that discusses exactly this issue: http://redis.io/commands/incr

camann9 commented 9 years ago

Thanks :-) . It's good that you reference the issue from the commit.

lonelycode commented 9 years ago

I actually realised in the shower this morning that this commit doesn't fully solve the issue anyway, since we set the value of the Quota if the key is not found, strictly speaking, Scenario A still applies since a concurrent set of requests could set the same key to a maximum, now this isn't too much of an issue, but theoretically speaking you could bypass quotas with some clever timing.

I'm considering switching it around to be an INCR instead of a DECR, then there's no SET required (then just subtract Max from Remaining to update the session cache, however the race condition that is mentioned in the pattern applies where there is a risk of a floating key, though I feel it's unlikely to occur.

I think Scenario B is solved since the rate limiter uses the Redis store as the "truth" and the session store as a record, so if a password reset resets the quota counter, if the key is already present then it will just DECR (or INCR, as the case will be), the actual value in the session is irrelelvant except for in the reply and health-checks, which can't be guaranteed values.

Will send some more time on this today.

lonelycode commented 9 years ago

Tests passing, quota and rate limiter now redis based with finer-grained quota reset commands in the API. :-)

camann9 commented 9 years ago

Awesome. Just a few things:

lonelycode commented 9 years ago
  1. Yes
  2. QuotaRenews is used to report back to health check headers, it is also used to set the initial rate limit. User can set it manually, updating a session via API will reset the quota in Redis (unless new param ?suppress_reset=1 is sent with request), this basically means quota renewals can be scripted to occur whenever if needed). This was actually how it all behaved, so we're preserving exiting functionality, but adding a workaround for session updates that shouldn't affect quota.
  3. Yes it can, though not a priority
  4. Eek! Fixed in master.
  5. Leaving it as is for now
camann9 commented 9 years ago

On 2.: QuotaRenews is not set any more in IsRedisQuotaExceeded. Thus it is not updated any more anywhere (since IsQuotaExceeded is not used any more). Or am I overlooking anything?

lonelycode commented 9 years ago

You're right, it isn't set, and really should be, it's used in reporting current session state back to the user through the rate-limit endpoint and reply headers:

https://github.com/lonelycode/tyk/blob/b28c02a59e9deec2bba398b1b3bdc2b4c50b31cc/api.go#L1204

https://github.com/lonelycode/tyk/blob/77f4f7d5e23d78a46f4d02a07044c6402fc9163a/tyk_reverse_proxy_clone.go#L185

Have just pushed an update to master branch.

camann9 commented 9 years ago

Yeah, that solves it :-)