Rate Limiting - Githubissues

htdvisser commented 5 years ago

Summary

We have to add rate limiting to our API endpoints.

Transferred from https://github.com/TheThingsIndustries/lorawan-stack/issues/546

Why do we need this?

Security: Prevent abuse of endpoints (such as login)
Performance: Prevent one entity from taking all resources

What is already there? What do you see now?

We currently don't have any rate limits.

What is missing? What do you want to see?

Rate limit before auth
- Limit by remote IP address
Rate limit after auth
- Limit by access token ID
- Limit by target entity ID (if access token is authorized)
- Different classes for different types of requests
Rate limits on streams (streaming RPCs, websockets, MQTT, UDP)

How do you propose to implement this?

For the actual implementation we can just use an existing library like https://github.com/throttled/throttled, which already has support for both in-memory and shared Redis record keeping.

The "key" for rate limiting would be <class>:<limitBy>:<IP/AccessTokenID/TargetEntityID>. We still need to define the rate limiting classes:

Class	Limit by	Max rate	Max burst
token	IP	x per hour	x
login	IP	x per hour	x
...	IP/AccessTokenID/TargetEntityID	x per s/m/h	x

For (1): when we have #443, valid access tokens are cached. This means that we can put the rate limiting on the func that verifies non-cached access tokens (the token class above). We also need to put IP rate limiting on the username/password login of the OAuth server (the login class above).

For (2) we need to define for each RPC whether we limit by access token ID or by target entity ID, and which rate limiting class should be applied.

RPC	Class
...	...

htdvisser commented 5 years ago

In https://github.com/TheThingsIndustries/lorawan-stack/issues/546#issuecomment-500902816 @johanstokking suggested the following classes:

Tier 1: user ops; login, password reset, search, etc.
Tier 2: heavy user/automated ops; listing, pagination, any access management changes **, etc.
Tier 3: light user/automated ops; entity CRUD, downlink queue ops

** including collaborator changes, API key changes, etc, which may need revocation, email notifications, etc

johanstokking commented 4 years ago

Network operators must have insight in why the server rejected requests because of rate limits.

htdvisser commented 4 years ago

Adding @neoaggelos to look into rate limiting on gateway connections to prevent bugs in the gateways (sending uplinks/statuses in a loop) from taking down the GS

johanstokking commented 4 years ago

@neoaggelos we need a flood check in the UDP firewall. This could a simple ring buffer of the last 10 messages or so, and if the oldest (next) passed message is within a certain interval (order of tens of milliseconds), don't pass the message.

neoaggelos commented 4 years ago

Do we not need something similar for the other frontends as well (mqtt/grpc/basicstation)? If so, the flood check maybe has to be implemented at the GS level (or at some point in between, but it should be frontend-agnostic).

johanstokking commented 4 years ago

For resending a bunch of messages from downstream (i.e. buffered on gateways, satellite groundstations, etc, see https://github.com/TheThingsNetwork/lorawan-stack/issues/2708), we do need to allow for a temporary burst and/or throttling, without returning errors.

adriansmares commented 4 years ago

For resending a bunch of messages from downstream (i.e. buffered on gateways, satellite groundstations, etc, see #2708), we do need to allow for a temporary burst and/or throttling, without returning errors.

I think we should turn this around and encourage any gateway maker to actually have some rate limitting on their side

johanstokking commented 4 years ago

I think we should turn this around and encourage any gateway maker to actually have

There’s some packet loss here; are you suggesting to move the responsibility to the gateway maker to not burst?

I think we should make it as easy as possible for the edge; if there’s a window of sending data in bulk, let them do it. We just need to figure it out with a message queue, potentially already in GS.

adriansmares commented 4 years ago

I think we should turn this around and encourage any gateway maker to actually have

There’s some packet loss here; are you suggesting to move the responsibility to the gateway maker to not burst?

I think we should make it as easy as possible for the edge; if there’s a window of sending data in bulk, let them do it. We just need to figure it out with a message queue, potentially already in GS.

That's what I meant, but I pressed enter too quickly.

In retrospective, if the sending window is limited it makes sense to allow them to send everything at once, but maybe we could think of a special infrastructure endpoint (i.e. dedicated instances) for this - the PPS is very, very large from these buffered gateways and it would be a pitty to have GS live-traffic performance be affected by this. I can imagine that satellite communications are almost never going to be live.

johanstokking commented 4 years ago

There's a difference between server-to-server upload of a lot of frames, i.e. from a satellite ground station, versus a single gateway that has a local buffer which it flushes.

For those ground stations, they should be using Packet Broker anyway, and not act as a gateway. Rate limiting works very different there.

As for individual gateways that are flushing their local buffers, this should be handled by GS.

htdvisser commented 4 years ago

As discussed during the call today, we'll come up with separate APIs for bulk-offloading buffered uplinks. This means that we can now go back to the original issue, and implement a generic ratelimit package that lets us limit:

gRPC calls
- vary by Service+Method, RemoteIP, AuthType+ID
- in streaming gRPC calls we limit both streams started, and message rate
HTTP calls (for non-gRPC APIs)
- vary by Path, RemoteIP, AuthType+ID
MQTT connections
- vary by Topic, RemoteIP, RemoteIP+Port, AuthType+ID
UDP Gateway connections
- vary by RemoteIP, RemoteIP+Port, EUI
Basic Station connections
- vary by Path (basically EUI), RemoteIP
- same as with streaming gRPC calls, we limit both streams started, and message rate

Depending on the place, we should think about what to do before/when rate limited:

In gRPC/HTTP APIs we can send 429 (too many requests)
In MQTT/WS connections we can close the connection (sometimes with an error)
In UDP connections we can only drop messages
In some gRPC/HTTP APIs we may want to start delaying requests before hitting the limit, which should make sure that scripts with loops stay within the limit without us having to return errors.

As for configuration, I guess we can start with some "fallback" configurations of "[N] calls per [duration] per [vary]". We can add more specific limits for specific methods (lower limits on security-sensitive endpoints). Perhaps also nice if we can add dynamic limits (so that we can allow more calls per hour from specific IP addresses, or respond to a (D)DoS by tweaking limits for some endpoints).

htdvisser commented 4 years ago

Moving this to Next Up until everything is clear, but please, @neoaggelos start working on this ASAP.

neoaggelos commented 4 years ago

After looking into this issue in depth, and taking into consideration the existing comments and pointers, I think the implementation that makes sense the most is:

Add a generic ratelimiting package, possibly based on Throttled. It can use either local-memory or Redis record keeping. We should not enforce either (rather, make sure that both can be used).
For user-facing gRPC (and HTTP) calls, we can have configuration for rate limiting classes (e.g. by service and/or method). The rate limiter can use either Redis or in-memory for record keeping (I think Redis is OK here).
For internal gRPC calls, especially any calls in the hot-path between GS <-> NS <-> AS, we have to go with in-memory record keeping (to avoid multiple Redis calls).
For gRPC calls, rate limiting is enforced with a middleware hook. In order to enforce rate limiting even for invalid auth, we could leverage more than one hook (one before auth, and one afterwards).
For number of connections, I think it makes most sense to have an in-memory rate limiter to prevent spamming Redis).

Currently, I want to avoid making too strict decisions as to how configuration is applied. Let's start with something very basic, and then shape it up as the actual implementation requirements become more clear.

johanstokking commented 3 years ago

What is the status and action item hee @neoaggelos ?

neoaggelos commented 3 years ago

While working on this I have had to look at many parts of the codebase, and figure out how best to de-duplicate the rate limiting functionality. The cleaner solution so far seems to have a pkg/ratelimit package, and use separate config per component/listener to setup rate limiting, instead of configuring rate limiting under a single section.

A challenging issue I have had to deal with during early iterations: The issue description and the comments suggest a complex rate limiting behaviour (e.g. rate limiting by IP address, request path and auth token ID). However, these checks should happen at different parts of the lifecycle of a request to make sense (e.g. rate limiting by IP address should be made as early as possible, before having to process the auth token ID).

A rough TODO:

[X] Base ratelimit package, implementing the rate limiting logic, as well rate limiter interfaces (for gRPC, echo, http)
[x] Rate limit gateway traffic: This is done on the GS frontend layer, to limit the blast radius as much as possible. We limit both gateway connections, as well as gateway traffic. Traffic is limited based on number of messages, instead of size.
- [X] MQTT (connections + traffic)
- [X] BasicStation (connections + traffic)
- [x] UDP (has rate limiting firewall already. we can use the new ratelimit package, but we should avoid breaking config changes).
- [x] gRPC (will be covered by the rate limiting of streaming responses)
[X] Rate limit grpc endpoints (with a UnaryServerInterceptor)
[x] Rate limit grpc streaming responses (with a StreamServerInterceptor)
[x] Rate Limit HTTP endpoints that are not translated to grpc calls (e.g. Gateway Configuration Server).
[ ] Unit tests
[ ] Decide on sane default limits (if we decide that rate limiting should be enabled by default).
[x] Allow overriding rate limits with configuration (e.g. for specific gRPC methods).
[x] Garbage collection for the rate limiter state.

Some work so far can be found in https://github.com/TheThingsNetwork/lorawan-stack/tree/feature/rate-limiting. @johanstokking @htdvisser please have a look if possible and comment on the general direction of the implementation on a high-level. I could create a draft PR, but I think it's too early for that, especially if major changes are required.

johanstokking commented 3 years ago

Please open a draft PR so we can comment.

I do like that there's a pkg/ratelimit with common functionality that can be used by various places in the stack.

Note that the gateway (traffic) rate limiting is actually the least of a concern now: we already have gateway upstream handlers that have implicit rate limiting and we have the UDP firewall. That being said, I think it's a good idea to replace the current UDP firewall by two generic rate limiting components: limit calls by remote IP and limit hits (messages) per session (UDP remote address). What is necessary and what is not covered by GS at the moment, is incoming connection rate limiting per remote IP.

johanstokking commented 3 years ago

@neoaggelos what is the status here?

nejraselimovic commented 3 years ago

@neoaggelos #3920 is merged, can this be closed?

neoaggelos commented 3 years ago

With #3920 merged, The Things Stack has basic support for rate limiting.

Things that we still need to cover and/or consider:

Rate limiting downlinks from PubSubs

Similar to the rest of the stack, apply rate limiting for downlinks that arrive by PubSubs (https://github.com/TheThingsNetwork/lorawan-stack/blob/v3.12/pkg/applicationserver/io/pubsub/pubsub.go#L180).

The key can be as:down:pubsub:app:<appUID>:pubsub:<pubsubID>, and the rate limiting can be as:down:pubsub.

Consider having separate rate limiting for HTTP requests before and after authentication. Currently, rate limiting is only applied after authentication, which does not protect from auth requests flooding the Identity Server.
Consider throttling connections (for Websockets, MQTT, gRPC, PubSubs) instead of terminating after the rate limit has been exceeded. The implementation here can be adding a static delay as penalty for exceeding the rate limit, similar to what happens with V2.
Consider ways to simplify configuration. The rate limiting profile configuration is documented (see https://github.com/TheThingsIndustries/lorawan-stack-docs/pull/290), but it may still be not straightforward to setup. My idea would be to keep the profiles section, where people can set advanced configuration, but also introduce a new section (e.g. default) where only a few constants need be set. cc @KrishnaIyer, please propose how this config section could look like.
Enable configuring IP address ranges (or subnets) for which rate limiting is not applied. This is to account for "internal" traffic or "trusted" IP addresses.

For the implementation, we could extend ratelimit.Profile configuration with something like:

  ExcludeIPAddresses: []struct{
      Start net.IP
      End net.IP
      // OR
      Network net.IPNet
  }

We can then add a (ratelimit.Resource).IPAddress() (net.IP, bool) method (returning the source IP and a boolean whether a source IP for the particular resource makes sense), and then use that from (*ratelimit.Profile).RateLimit to return immediately if this is an excluded IP address.

cc @htdvisser, @johanstokking. Let's turn this into an umbrella issue going forward, I will create issues for each task as needed, but let's discuss first.

johanstokking commented 3 years ago

1. Rate limiting downlinks from PubSubs

We're getting rid of them in TTSC and TTN. They remain for TTSOS but we don't need rate limiting for that.

2. Consider having separate rate limiting for HTTP requests before and after authentication

This is the job of our front proxy.

3. Consider throttling connections (for Websockets, MQTT, gRPC, PubSubs) instead of terminating after the rate limit has been exceeded

Yes please.

4. Consider ways to simplify configuration

Haven't looked at it yet. But how about sane defaults?

5. Enable configuring IP address ranges (or subnets) for which rate limiting is not applied

We need this too. You probably just want a slice of IP masks.

htdvisser commented 3 years ago

@neoaggelos can you please post a status update here? What's done and what's still TODO?

We recently had quite some issues with MQTT on V2, so I'm also interested in hearing about MQTT connection limits. Can we already limit the number of active MQTT connections (per username, per IP)?

neoaggelos commented 3 years ago

I'm also interested in hearing about MQTT connection limits. Can we already limit the number of active MQTT connections (per username, per IP)?

For Community and Cloud, we do limit rate of new connections. Limiting the maximum number of concurrent connections is basically coming with https://github.com/TheThingsIndustries/lorawan-stack/pull/2879

Enable configuring IP address ranges (or subnets) for which rate limiting is not applied

We need this too. You probably just want a slice of IP masks.

This has been implemented in [https://github.com/TheThingsNetwork/lorawan-stack/tree/feature/rate-limiting-config-subnets], along with unit tests to verify the behaviour. If implementation seems good enough, then I can create a PR for it (otherwise, someone else can pick it up).

TheThingsNetwork / lorawan-stack

Rate Limiting #1406

Summary

Why do we need this?

What is already there? What do you see now?

What is missing? What do you want to see?

How do you propose to implement this?