[Feature] User-specific routable Gefyra bridge ("user bridge")

Schille commented 2 weeks ago

Intro

Gefyra currently supports "global bridge" only. See https://gefyra.dev/docs/run_vs_bridge/#bridge-operation to learn more. In short, a container within a running Pod (or multiple replicas) is replaced by a Gefyra component called Carrier. This allows Gefyra, with some constraints, to route traffic to a local container originally targeted to the specified Pod within the cluster.

gefyra-bridge-action drawio

This capability helps debug local containers using traffic from the cluster, rather than synthetic or made-up local traffic. However, the bridge is currently globally effective for all traffic directed to the bridged Pod, which may sometimes be undesired. This means that only one bridge per Pod can exist, allowing only one user to bridge a Pod at a time. With this feature proposal, we aim to lift that limitation in a flexible yet robust way.

This feature addresses the following issues:

Remark: Remember that one of Gefyra's fundamental premises is not to interfere with Kubernetes objects from your workloads. The proposed feature draft does not involve modifying existing deployment artifacts. Why? If something goes wrong (as things often do), we want Gefyra users to be able to restore the original state simply by deleting Pods. However, there may be residual objects or other additions that should never disrupt the operations of the development cluster. Gefyra aims to minimize this risk by treating it as a bug.

What is the new feature about?

Gefyra's bridge operation will support specific routing configurations to intercept only matching traffic, allowing all unmatched traffic to be served from within the cluster. Multiple users will be able to intercept different traffic simultaneously, receiving it on their local container (started by gefyra run ...) to serve it with local code.

gefyra-personal-bridge drawio

Departure

The main components involved in establishing a Gefyra bridge are:

Gefyra Client: Requests a Gefyra bridge that is globally effective.
Gefyra Operator: Acts on the bridge request by setting up the target Pod with Carrier, establishing a chain of reverse proxies into the client network (including Kubernetes objects), and reporting back the result of the operation.
Gefyra Stowaway (connection provider): Dynamically creates a reverse proxy for the specific bridge into the client network.
Gefyra Carrier (bridge provider): Replaces running containers and proxies incoming TCP/UDP traffic to the reverse proxy chain set up by the Operator.

Remark: Gefyra's cluster component architecture consists of different interfaces. The connection provider and bridge provider are two abstract concepts with defined interfaces. "Stowaway" and "Carrier" are the current concrete implementations of these interfaces. However, depending on the results of this implementation, I expect at least the latter to be replaced by a new component (perhaps Carrier2?). For consistency, I will continue to use these component names.

Overview

gefyra-personal-bridge1 drawio

Carrier

gefyra-personal-bridge2 drawio

Currently, Carrier is installed into 1 to N Pods. Each instance upstreams any incoming traffic ("port x") to a single target endpoint ("upstream-1"). This process does not involve traffic introspection: IP packets come in and are sent out as-is. This setup is simple and fast. Carrier is based on the Nginx server and thus is configured using the stream directive: https://nginx.org/en/docs/stream/ngx_stream_core_module.html#stream

Feature draft

Stage 1: Installation & keep original Pods around to serve unmatched traffic

When a compatible service is bridged, we need the original workloads to serve any unmatched traffic through the user bridge. Consider the following example: a compatible workload <Y> is selected by a Kubernetes service object. This workload consists of 3 Pods.

gefyra-personal-bridge4 drawio

Once a user bridge is requested, Gefyra's Operator replicates all essential components (most importantly, the Pods and the service) by cloning and modifying them. Pod <Y1'> is modified on the fly so that it is selected by service <Y'>. The Pods <Y1'>, <Y2'> and <Y3'> must not be selected by service <Y>. Most other parameters - such as mounts, ports, probes, etc. - should remain unchanged.

gefyra-personal-bridge5 drawio

The cloned workload infrastructure remains active as long as at least one Gefyra user bridge is active.

The Gefyra Operator installs Carrier into the target Pods (<Y1>, <Y2> and <Y3>) and dynamically configures them to send all unmatched traffic to the cloned infrastructure <Y'>. This setup ensures:

No existing workloads are modified (except for temporary image changes).
Common traffic can still be served, with just one additional hop.
If the cluster setup is interrupted, it can be easily restored by re-rolling out the source of the ReplicationSet <Y>

gefyra-personal-bridge7 drawio

Of course, if there is a different replication factor or other deployment scenarios (e.g., Pod only), the Gefyra Operator adapts accordingly. I hope the idea makes sense.

Stage 2: Add a local upstream & redirect matching traffic

The Carrier component will require significant changes as we shift from a “stream”-based proxy to a more advanced proxy ruleset, incorporating path and header matching for HTTP, along with routing rules for other protocols in the future. Fortunately, the required changes in the Gefyra Operator are not as extensive as those in Carrier. Several interfaces already support creating different routes within the connection provider ("Stowaway") and bridge provider abstractions.

gefyra-personal-bridge8 drawio

Interface reference for connection providers (Stowaway):

https://github.com/gefyrahq/gefyra/blob/9fcbf7ec167b5a8bf470f710d8c3f6444f9253be/operator/gefyra/connection/abstract.py#L64-L103

Interface reference for bridge providers (Carrier, Carrier2):

https://github.com/gefyrahq/gefyra/blob/9fcbf7ec167b5a8bf470f710d8c3f6444f9253be/operator/gefyra/bridge/abstract.py#L40-L61

Rules

The GefyraBridge CRD already supports an arbitrary set of additional configuration parameters for the bridge provider. https://github.com/gefyrahq/gefyra/blob/9fcbf7ec167b5a8bf470f710d8c3f6444f9253be/operator/gefyra/resources/crds.py#L17-L25

For HTTP traffic, the routing parameters appear to be quite obvious:

path matching (e.g. /api/objects/5)
header matching (e.g. owner: john)

Each user bridge adds a new entry to the upstream servers for Carrier, along with an additional (verified) matching rule. The operator's validating webhook should implement matching rule validation to catch common mistakes (e.g., a rule already applied by another user or a rule that never fires due to another bridge capturing all traffic). If a matching rule is invalid, the creation of the GefyraBridge is halted immediately.

Stage 3: Remove a user bridge

Removing a bridge is a two-phase process: 1) The deletion request from the Gefyra Client prompts the Operator to initiate the bridge removal. 2) Both the bridge provider and the connection provider are called upon to delete their respective routing configurations.

Remove the last user bridge & clean up

If Stage 3 removes the last active bridge for a Pod, the uninstallation procedure is triggered. This process includes resetting the patched Pods (<Y1>, <Y2> and <Y3>) to their original configuration and removing the cloned infrastructure (Pod <Y1'>, <Y2'>, <Y3'> and service <Y'>).

Closing remarks

I would like to retain the current implementation of the "global bridge," as it is a high-performance solution. Therefore, we should add a new flag, gefyra bridge ... --global to enable the global bridge with its current behavior.
Setting up the first user bridge for a Pod/Deployment will be more time-intensive than setting up subsequent user bridges (since Stage 1 is triggered), so we should handle this appropriately.
This concept is likely not applicable to RWO StatefulSets, as we cannot simply clone a StatefulSet Pod with an RWO mount; this limitation should be conveyed with an error message.
I currently don’t have a solution for HTTPS traffic interception by Carrier. One possibility could be to add a parameter in the bridge request to direct Carrier to the PKI, or to support the creation of custom bridge provider images. (The UID/GID issues remain as well.) With a custom image, we could integrate specific bridge parameters and designate this image for a particular Pod.

This feature is currently in the ideation phase. I would appreciate any external feedback on how to make this as useful and robust as possible. If you want to talk to me about this (or Gefyra in general), please find me around at our Discord server: https://discord.gg/Gb2MSRpChJ

I am also looking for a co-sponsor of this feature. If you or your team want to support this development, please contact me.

liquidiert commented 1 week ago

First off: Great RFC @Schille ! Just one quick question: What happens to the shadow infrastructure when the original might change while a bridge is active? I'm sure there's already handling for this case when using global bridge but what is the appropriate procedure here?

Schille commented 1 week ago

@liquidiert Gotcha! A rollout of the original workloads would render the bridge useless since Gefyra's patch will be reset. The operator should reconcile all bridges, detect that situation, and take appropriate action (patching again, setting up user bridges to work again). Or declare existing user bridges stale and remove them.

liquidiert commented 1 week ago

@Schille that sounds like a good reconciliation tactic, thanks!

crkurz commented 1 week ago

This look terriffic, @Schille ! Thanks a lot !

Please allow me to add some questions

Do we need to call out that multiple users can bridge multiple services?
Nit: Terminology: does it make sense to change "and the removal of the phantom infrastructure..." to "and the removal of the cloned infrastructure"? (just to avoid an extra name)
Are there any limitations which apply to infra cloning? Things a pod/service configuration must or must not have? E.g. node- or other-affinity? special session-handling/routing ? Should I try to get Anton's/Rohit's thoughts here? e.g. around special handling for WebSockets with their need for cross-user session handling ?
Are there chances for any impact on validity of server certificates due to the traffic redirection?
How long do we expect setup (or tear-down) of cloned infra to take? and for how much of this time do we expect the regular service to be non-responsive? - In case this could take a bit more time, do we need an option to preserve phantom infra even after removal of last bridge? Or even an option to explicitly install phantom infra independent of bridge setup?

Again, great feature! Thank you, @Schille

Schille commented 6 days ago

@crkurz Thank you.

To your questions:

You should already be able to bridge multiple services simultaneously. If that's unclear, we must add that bit to the docs.
You are right. I changed it.
I don't see more limitations than mentioned. Since we'll clone the pods with all attributes (except for the selector-relevant labels) I don't expect affinity issues. But the more people who join the party, the better it is: I would welcome it if you would take up Anton/Rohit's thoughts on this.
Yes, that's not 100% clear as of now. We must find a solution to tell Carrier which certificates to use to introspect SSL traffic and decide on the route.
That depends. Small apps - short setup time. Java - huge setup time. =) I thought about that too and I am tempted to agree to a concept that represents the bare installation of a bridge without actually having a single user to match traffic,

gefyrahq / gefyra