envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy
https://www.envoyproxy.io
Apache License 2.0
24.92k stars 4.8k forks source link

Envoy Reverse Connections: Communicate with downstream envoy behind a private network. #33320

Open basundhara-c opened 6 months ago

basundhara-c commented 6 months ago

Envoy reverse Connections: Communicate with downstream envoy behind a private network.

Description:

This is a design proposal for using envoy proxy to access downstream services in a private network from apps in a public network by re-using client connection sockets. We have the following setup at Nutanix for which we implemented this solution:

Reverse Connections_ Scenario (1)

There have been previous enquiries about this feature, as in:

  1. Previous Issue
  2. StackOverflow discussion

Here are the broad steps of our implementation:

          metadata:
            filter_metadata:
              envoy.reverse_conn:
                clusters:
                    cluster_name: "cluster 1"
                    reverse_connection_count: 5
                    cluster_name: “cluster 2”
                    reverse_connection_count: 10

The solution is shown in the diagram below. Further details explaining the implementation more can be provided upon request.

Reverse connection workflow

This feature addresses a common need among EnvoyProxy users, and we believe that integrating this reverse connection feature into EnvoyProxy's core functionality would greatly benefit its user base. We would like to ask if we can share our implementation with the EnvoyProxy community to push these changes upstream for broader adoption.

phlax commented 6 months ago

cc @wbpcode @alyssawilk @mattklein123

alyssawilk commented 6 months ago

I think this could and ideally would be implemented as an envoy extension. There's a policy regarding extension addition here: https://github.com/envoyproxy/envoy/blob/main/EXTENSION_POLICY.md If your work doesn't meet the guidelines, you could add this to contrib.

basundhara-c commented 6 months ago

Thanks for your reply @alyssawilk ! I have described the current design in some detail as we have some crucial changes in envoy core in order to make this work.

Detailed Steps

Reverse Connection Components

Reverse Connection Initiation and Acceptance

  1. Reverse connection initiation is triggered by the addition of a listener (let's call it "rc_listener") with extra metadata fields. The said metadata contain a list of remote clusters to which reverse connections are required and the number of reverse connections required for each, like so:
          metadata:
            filter_metadata:
              envoy.reverse_conn:
              source_node_id: "initiator_node"
                clusters:
                    cluster_name: "cluster 1"
                    reverse_connection_count: 5
                    cluster_name: “cluster 2”
                    reverse_connection_count: 10

This metadata indicates that instead of binding to a port and listening (bind_to_port is set to false), rc_listener has to invoke the reverse connection workflow. In (TcpListenerImpl,) we check whether the above metadata is present, and if so, we set bind_toport to false, collect the of cluster -> reverse connection count information into a "remote_cluster_to_conns" hashmap, and register a request for reverse connection creation.

The next few steps are performed by three new entitities added within DispatcherImpl:

ReverseConnectionInitiator (RCInitiator)

Thread local entity within Dispatcher, that is created unique for each Listener Tag. On being created, the RCInitiator initiates "reverse_connection_count" connections to each "cluster_name" in rc_listener's metadata. Upon connection closure, it is invoked to re-initiate connections.

ReverseConnectionManager(RCManager)

A single thread local resource that manages the lifecycle of several ReverseConnectionInitiators. The RCManager maintains a map "available_rc_initiators" of RCInitiator created per listener tag, and a map "connection_to_rc_initiator_map" storing each reverse connection's key to the RCInitiator that created and owns it. The RCManager provides a couple of important APIs:

ReverseConnectionHandler(RCHandler)

A thread local socket manager that functions only on the responder envoy side. It stores a map "accepted_reverse_connections" of initiator_node -> list of ConnectionSocketPtr; each accepted reverse connection.

  1. The registerRCInitiatornitiators API is called by TcpListenerImpl upon discovery of reverse connection metadata, thus creating a RCInitiator. The created RCInitiator is stored in the "available_rc_initiators" map.

  2. The RCI, upon initiator, runs a periodic function maintainConnCount(), that checks iterates through the passed remote_cluster_to_conns map and initiates "reverse_connection_count" connections to each "cluster_name". For each cluster, the RCI obtains a thread local cluster entry by calling the cluster manager's getThreadLocalCluster() and then obtains an existing tcp Connection to that cluster. The ClientConnectionPtr is extracted and a ReverseConnectionHandshake HTTP POST request is written to it. This handshake contains information about the initiator envoy (node_id,cluster_id etc) and a protobuf is defined for the format. The connectionKey of this connection is defined as the local socket address (IP:port pair) and is obtained from the ClientConnectionPtr's ConnectionSocket. The RCI adds a read filter to the ClientConnection so that responses from the responder envoy can be intercepted. It also maintains an internal map of cluster -> connection count to re-initiate in case of closure.

  3. Each envoy has a listener called "Transport Service Listener" that accepts reverse connections and serves as an endpoint for reverse_connection related queries, for eg., obtaining reverse connection stats, etc etc. We have added a new "reverse_conn" filter that does these operations.

  4. The reverse_conn filter intercepts HTTP requests, and if a handshake is received, extracts the source information and verifies the certificate is present (verified if the SANS matches the cluster_id, etc etc). The source node_ID is a mandatory field in the reverse connection handshake and if that is not present, the handshake is rejected. A reverse connection handshake return HTTP message is sent to the initiator.

  5. If accepted, the reverse_conn filter extracts the raw downstream Connection from the Stream Filter Callback and caches the Raw Connection Socket. It resets file events on the socket's IOHandle, and calls the thread-local Dispatcher's RCHandler.

  6. The RCHandler adds the node_id -> ConnectionSocketPtr mapping to the accepted_reverse_connections map, and then does a couple of things:

    • It triggers a periodic function to send RPING keepalives on all accepted connection sockets.
    • It obtains the underlying File descriptor from the connection socket and adds creates a File Event to respond to RPING replies from the initiator envoy upon file read. If a ping response is not received within a user defined timeout, the socket is marked dead.
  7. On the initiator envoy's side, the RCI's read filter intercepts the reverse connection handshake return message and checks whether it was accepted. If not, it closes the ClientConnection. If not, it resets file events on the connection socket, and then set a new boolean flag:connection_reused to true for the connection. This is so that a connection closure is skipped for a reverse connection. The RCInitiator -> connection info is added to the RCManager's connection_to_rc_initiator_map, after which the connection socket is passed to the initiating listener. (rc_listener in this example).

  8. On the initiator end, rc_listener has an attached filter called "reverse_connection" filter. The sole purpose of this filter is to wait for the RPING keepalives described in step 7, and respond to them. From the time a socket is accepted by this reverse_connection filter, if RPING keepalives are not received within a user defined timeout, the socket is marked dead.

Reverse Connection Re-Initiation in case of closure

  1. Upon connection closure, the RCManager is notified.

  2. The RCManager notifies the owning RCInitiator by looking up in connection_to_rc_initiator_map

  3. The owning RCInitiator updates the closure in its internal cluster -> connection map. The next iteration of maintainConnCount() initiates one more connection to the remote cluster.

Serving requests from upstream -> downstream envoy by using reverse connections

For requests to work from upstream envoy to downstream through the cached sockets, clusters used by the upstream(responder) envoy to forward requests can not figure out the list of Endpoints by traditional means. This is because the list is neither static nor a DNS call away. Instead, it will have to rely on the current list of reverse connections accepted by that Envoy. To resolve this, we have introduced a new cluster type called "reverse_connection" (and load balancer type) that allows upstream envoy to dynamically pick a reverse connection socket based on the downstream request context. The upstream envoy config, therefore, should have rules to route traffic to downstream services (which should go via a reverse connection) to a cluster of type "reverse_connection".

  1. The upstream envoy expects such requests to come with the "x-dst-node-uuid" set. The value of the "x-dst-node-uuid" is the downstream node which exposes the service.

  2. The reverse_connection cluster maintains a map of node_id -> Host. Upon receipt of a request, a HostImpl is created for the node_id and node_id is set as the "host_id" for that host. Subsequent requests re-use the host.

  3. The addition of the host_id ensures that a reverse_connection is used to send requests to that host. When the Host calls createConnectionData, we check if the host_id is present, and if so, we invoke the Dispatcher to create a ReversedClientConnectionImpl. The ReversedClientConnectionImpl extends ClientConnectionImpl and instead of creating a client socket from the remote address, takes in the client and transport sockets directly. The client socket is obtained from the accepted_reverse_connections map by quering the RCHandler. In ReversedClientConnectionImpl we override the connect() method to do nothing since we are already connected on the socket. Therefore, the request is sent over a reverse connection. The reverse_connection cluster also does periodic cleanup of stale hosts.

The process is illustrated in the diagram above. This involves a couple of crucial changes in envoy's core dispatcher, during rc initiation, and also in step 14-15 to ensure that a reverse connection is picked by the Dispatcher, thus requiring envoy core changes. Do feel free to share any suggestions/clarifications on our current design for the process of sharing them upstream!

basundhara-c commented 6 months ago

Hi @alyssawilk @wbpcode @mattklein123! I just wanted to follow up on the above design proposal detailing where we had to make envoy core changes to get reverse connections to work! We would love to get suggestions on any changes you'd like to see in order to get this feature integrated with envoy!

agrawroh commented 6 months ago

@basundhara-c Instead of creating a separate reverse connection, it is possible to open a BiDi stream from downstream Envoy to the upstream Envoy and use the same stream while talking from upstream to downstream as well?

alyssawilk commented 5 months ago

The process is illustrated in the diagram above. This involves a couple of crucial changes in envoy's core dispatcher, during rc initiation, and also in step 14-15 to ensure that a reverse connection is picked by the Dispatcher, thus requiring envoy core changes. Do feel free to share any suggestions/clarifications on our current design for the process of sharing them upstream!

I don't understand why the reverse connection needs to be picked by the dispatcher. The dispatcher manages the event loop on a given thread - it doesn't "pick connections". You'd need a listening port on the thread (doable in an extension) and a custom connection pool which instead of creating a new connection waited for one to be established. I suspect much if not all of this can be done without core envoy changes so I'd suggest you prototype locally and ping back when you have specific changes?

basundhara-c commented 5 months ago

@alyssawilk thanks for the suggestion! We were looking into this and it might be possible, we'll be trying it out internally and reaching out when we have changes! @agrawroh I'm yet to dig into the code to understand how exactly that would be done, but I think @alyssawilk 's suggestion might be a more generic version of what you are suggesting!

github-actions[bot] commented 4 months ago

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

github-actions[bot] commented 4 months ago

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.

basundhara-c commented 3 months ago

@alyssawilk, on digging deeper, we find that the http connection pool's base implementation calls the dispatcher (via Host) to create a CreateConnectionData object that wraps over the connection object and the host description. At this point, the upstream connection is created and we need to override this in our custom connection pool implementation and supply a cached socket instead of creating a new connection. Is it okay if we move the creation of CreateConnectionData to a separate function (say, init()) which we can override in our custom pool? Any suggestions are welcome!

alyssawilk commented 3 months ago

I don't think calling a virtual function in the constructor is going to get you very far, but I wouldn't object to a creation_function argument which defaults to the existing function and you can set to something custom, if that'd work? OOO shortly as reflected on the maintainer calendar, so forgive any lag on further replies

basundhara-c commented 3 months ago

That should work, thank you!

basu1706 commented 2 months ago

Hi @alyssawilk, as described in the diagram and description above, we are adding two new entities ReverseConnectionManager (which manages the lifecycle of ReverseConnectionInitiators and needs to be initialized before r listeners are initialized on the worker ) and ReverseConnectionHandler (which caches sockets and supplies them for downstream requests and listens for reverse connection requests). These two entities are thread local and need to be initialized right after the workers are initialized. We are working on adding them as an extension, but were wondering what the best place would be to initialize them. We felt that we could initialize them in the worker implementation where the connection handler and dispatcher are initialized. We would love to hear your thoughts on this!

alyssawilk commented 1 month ago

sorry for the delay I was out for a few weeks. Would it be possible for you to have reverse connection code registered as a bootstrap extension and have all the bootstrap extensions be informed when the worker threads are created? They could get onWorkerThreadInitialized the way they get onServerInitialized and could create per-worker items?

cc @wbpcode @zuercher for other thoughts

basundhara-c commented 1 month ago

Thanks a lot @alyssawaik, we attended an Envoy meeting wherein Greg suggested something on the same lines. I am currently exploring this option!