[UDP][Feature] UDP Content Token Routing

markmandel commented 4 years ago

(Please consider this document a sacrificial draft. Feedback corrections and comments are very much appreciated, and likely warranted as I am only newley experienced with Envoy)

Objective

Be able to preemptively route a UDP session to a specific upstream entry in the cluster, based on content available (i.e. a token) within the UDP packet.

This is specifically useful for stateful endpoints in a cluster, such as a Dedicated Game Server for multiplayer games (which is my primary expertise), or VOIP/SIP backends utilise (I believe).

For this reason, any sort of random/round robin type load balancing is not effective, as we need to be able to specifically send a session to a specific cluster upstream endpoint.

Background

Articles

Wikipedia: Game Servers
- The best definition of a dedicated game server in writing.
UDP vs. TCP
- Why UDP is so important for realtime multiplayer games

Presentations

Scaling Multiplayer Game Servers with Kubernetes
- First introductory section of this presentation covers the general architecture for multiplayer game servers.
Denial of Service Mitigation (Valve @ GDC)
- Good discussion of applying proxies to multiplayer, dedicated game servers and the problems they can be applied to.

Requirements and scale

Requirements:

Be able to add and remove from set of “client tokens” (arbitrary byte[]/string) to an upstream cluster endpoint
Envoy should have a configurable way to pull the client token from the incoming UDP packet contents. E.g. The token could be the last 1024 bytes of the UDP packet.
- We have to pass the token this way, as UDP packets don’t have headers, so any extra information must be part of the byte[] payload of the UDP packet.
When a UDP packet is received by Envoy, it will:
- Parse the client token out of the packet, based on the client token configuration above
- Compare the token to the sets of upstream cluster endpoints, find the one that it matches to
  - If a match is found:
    - Configure a session to the matching upstream cluster endpoint, so that data can be sent back to the sending downstream client,
    - Send the UDP packet to the matched upstream endpoint.
    - If the token matches to a different upstream endpoint than previously, move the current session to the new upstream endpoint.
  - If there is no match, drop the packet, and end processing.

Use Cases

The specific use case that I want to cover is around Dedicated Game Servers for multiplayer games, but could potentially be applied to any sort of stateful system that uses a UDP stream as a communication protocol.

Can run Game Server on a private network, and only expose the Envoy proxy, thus reducing the surface area that is available to potential attackers.
Can have fine grained, real time control of who can access GameServers, and which ones through client token addition and removal
- This means that bad actors can have their client tokens removed quickly, removing their access to the most sensitive part of multiplayer games infrastructure
Dedicated Game Servers are usually a single public IP and port, and can be single points of failure for a single multiplayer game session - and as such, are targeted for DDOS attacks.
- Clients can distribute their traffic to multiple proxies, which is much harder to DDOS and take down, and provides redundancy.
Reduce public facing IP addresses (more of a problem for ipv4 than ipv6)
Standard UDP traffic statistic reporting through Envoy’s statistics collection.

Concerns / Questions

Is there a way we can do this without sending the token on every request? Since it may not be encrypted, the token can be seen in traffic. Maybe we could only send the token on initial request / network change? (or maybe we need to also have encryption?)

Design ideas

This is a sacrificial draft for a potential configuration for the content token routing:

admin:
  access_log_path: /tmp/admin_access.log
  address:
    socket_address:
      protocol: TCP
      address: 127.0.0.1
      port_value: 9901
static_resources:
  listeners:
    - name: listener_0
      address:
        socket_address:
          protocol: UDP
          address: 127.0.0.1
          port_value: 7650
      listener_filters:
        # our new type of udp router
        name: envoy.filters.udp_listener.udp_router
        typed_config:
          '@type': type.googleapis.com/envoy.config.filter.udp.udp_router.v2alpha.UdpRouterConfig
          stat_prefix: service
          cluster: gameservers_cluster
  clusters:
    - name: gameservers_cluster
      connect_timeout: 0.25s
      type: STATIC
      # since our listener filter provides the routing
      lb_policy: CLUSTER_PROVIDED
      load_assignment:
        cluster_name: gameservers_cluster
        endpoints:
        # three potential game servers to connect to on localhost
        # but different ports.
        - lb_endpoints:
            - endpoint:
                metadata:
                  # client tokens are stored in the metadata, as struct key values
                  # When `true`, the token has access, when false or non existent, access is denied.
                  "envoy.config.filter.udp.udp_router.v2alpha.UdpRouterConfig/tokens":
                    x7zs9: true
                    18z9y: true
                    j9zwk: true
                address:
                  socket_address:
                    address: 127.0.0.1
                    port_value: 26000
        - lb_endpoints:
            - endpoint:
                metadata:
                  "envoy.config.filter.udp.udp_router.v2alpha.UdpRouterConfig/tokens":
                    97zx9: true
                    18zyy: false # this client-token no longer has access
                address:
                  socket_address:
                    address: 127.0.0.1
                    port_value: 26001
        - lb_endpoints:
            - endpoint:
                "envoy.config.filter.udp.udp_router.v2alpha.UdpRouterConfig/tokens":
                  97ix0: true
                  16zyy: true
                  p6z9y: true
                  f6z3y: true
                address:
                  socket_address:
                    address: 127.0.0.1
                    port_value: 26002

Concerns / Questions

Game Servers / Upstream client endpoints could potentially be added and removed in a very dynamic way (100’s added and 100’s removed at a time). Can Envoy handle this type of dynamic configuration rate of change?
A single game session could have potentially thousands of players per endpoint. That means that every upstream cluster endpoint could have thousands of client tokens associated with it. Can Envoy handle this extra amount of data.
Client tokens will be added and removed a rate much higher than that of Upstream endpoints - Can Envoy handle this type of dynamic configuration rate of change?

Alternatives considered

Being able to somehow preemptively create sessions based on sender IP/port information.
- At initial pass, couldn’t find a way to implement this
- Also, with games on mobile networks, especially - network changes are far more frequent than PC/Consoles. (Although you may have to re-auth anyway? May be worth discussion)
Being able to provide a token that identifies the upstream cluster endpoint specifically
- This is a potential security concern, as any client with the upstream endpoint token has access, and you can’t revoke it
- In reality, with the current design, you could do this anyway if you wanted to, but using a single token per endpoint.

luna-duclos commented 4 years ago

One thing I'd like to see here is to be able to have a sort of fallback routing for when no specific token is configured.

luna-duclos commented 4 years ago

I'd also like to add the explicit consideration that tokens could be any length and envoy shouldn't be opinionated on that.

mattklein123 commented 4 years ago

Thanks for raising this @markmandel. This is actually a more general case of what needs to be done for https://github.com/envoyproxy/envoy/issues/1193 in which we need to route UDP packets based on the QUIC connection ID. I have some thoughts on how we can approach this and will reply back when I have some more time. cc @danzh2010

markmandel commented 4 years ago

@mattklein123 glad to hear it has a more general application than the use cases I am thinking of as well.

I didn't think to look at how QUIC implements this! :man_facepalming: There is so much good prior art there for a variety of use cases (sessions, crypto, etc).

https://quicwg.org/base-drafts/draft-ietf-quic-transport.html#name-connections (for this also subscribed who want to read up)

beriberikix commented 4 years ago

As another data point, IoT protocols also rely on tokens for routing - and other things, like request/response matching, caching and congestion control. CoAP (rfc7252) is based on UDP and one I'm particularly interested in seeing work with Envoy. The way CoAP uses tokens is slightly different than the way Mark/QUIC is describing (it's more a request ID) but hopefully helpful in thinking about a generalized solution.

chadr123 commented 3 years ago

I would like to support hash policy in udp proxy.

The udp proxy does not support hash based lb algorithms perfectly because it does not provide LoadBalancerContext when choose a host. So, the udp proxy with hash based lb algorithms will select a host by random manner.

I have investigated the tcp case and I found that it has the hash policy option. So, I think that we can support it in udp case as well simply.

This does not depend on the incoming packet's content.

Here is the my draft version of implementation : chadr123@d95c3f5

Please give your opinions for my idea. Thanks!!

mattklein123 commented 3 years ago

@chadr123 can you open a PR where we can discuss? I want to make sure we built the API in a way that will allow for byte range hashing. I think this can just be a wrapper message with a oneof inside of it that initially just has the general hash policy, and then later we can add byte range hashing on the datagram. Thank you!

chadr123 commented 3 years ago

@chadr123 can you open a PR where we can discuss? I want to make sure we built the API in a way that will allow for byte range hashing. I think this can just be a wrapper message with a oneof inside of it that initially just has the general hash policy, and then later we can add byte range hashing on the datagram. Thank you!

Ok. I will open a PR soon. :)

ggreenway commented 3 years ago

In addition to the work that @chadr123 is planning to do, if that is combined with a filter similar to header-to-metadata from http and #12594, token-based routing will work.

ronaldfenner commented 2 years ago

I'd like to suggest an additional way for at least the games use case.

Where @markmandel has the client token as part of the UDP packet one could include a server token as well. I know he expressed in his videos covering this that he didn't like the idea but i think that was only to just a server token.

I would suggest the client auth token and the server token. The server token is what's used to route to the upstream server while the client auth token is used to allow or drop the packet.

The client token could be manually added to the config or another way would be to offload the authorization to an external auth server to handle it. If the external service approves the client token its mapped with its network tuple such that further packets with that same tuple and client token would be passed through with out sending out for the auth call. There could be a TTL on the token so that after x amount of seconds the auth service is called again to see if it's still valid. Also if the network tuple doesn't match the stored tuple for the client token then the client token would be reauthorized.

In the event that the reauthorized client token failed on network allow an option to keep current mapped token, drop mapped token or reauthorize the mapped token with the attached network tuple.

This implementation alleviates some of his concerns about number of tokens/session per endpoint in that the server token wouldn't change that often and be at a lower count than one per client session. With sessions having a TTL for the token it would cut down on the rate of them being added and removed as a typical game session usually last for a bit and by not having to authorize every packet it would cut down on the lag introduced by calling the auth service.

Also it would be nice if there was a way to call an API for the listener to remove/update the token info. Example user logs out of their game and the service handling the logout could call the to the listener to remove the client token.

I also don't see in his proposed config on how to specify where the token would be found in the UDP packet. It shold probably be part of the typed config with my proposed addition you'd could have client_token: 0:3 server_token:4:6 By giving a byte rage of where to extract the data then the filter can just use the values to extract the tokens. Could even support negative ranges that would start from the end of the packet instead.

envoyproxy / envoy