gardener / gardener

Homogeneous Kubernetes clusters at scale on any infrastructure using hosted control planes.
https://gardener.cloud
Apache License 2.0
2.81k stars 463 forks source link

Distribute HTTP/2 calls across all API server instances #8810

Open timebertt opened 7 months ago

timebertt commented 7 months ago

How to categorize this issue?

/area control-plane networking scalability /kind enhancement

Context

Since GEP-08, traffic to Shoot API servers is passed on by the istio ingress gateway to the respective shoot API server by looking at the TLS SNI header (see control plane endpoints). In this architecture, TLS connections are terminated by individual API server instances. After establishing a TLS connection, all requests sent over this connection end up on the same API server instance. I.e., the istio ingress gateway performs L4 load balancing only but doesn't distribute individual requests across API server instances.

Usual HTTP/2 implementations (e.g., in Go) use a single TCP/TLS connection as long as MAX_CONCURRENT_STREAMS is not reached (this was basically the promise of HTTP/2 – to reuse a single L4 connection for many concurrent streams). With this, a typical controller based on client-go will send all of its API requests to a single API server instance. By deactivating HTTP/2 however, client-go will open a pool of TCP/TLS connections instead and distribute API requests across these L4 connections, which realizes a good distribution across API server instances because the istio ingress gateway balances the load on this layer.

Problem Statement

With the current architecture, we can't make use of the HTTP/2 protocol in shoot API requests. In fact, activating HTTP/2 can come with a performance and scalability penalty over HTTP/1.1. However, HTTP/2 is used by default in most clients like client-go. We can observe an unequal distribution of API requests across API servers, especially in clusters with "big" operators performing most of the API requests. This comes with the potential of overloading individual API server instances while other instances are idling.

As a consequence, the efficiency of autoscaling the API server vertically is reduced, because the resource footprint of instances can differ significantly. VPA works best with equally sized instances. Also, when API servers are terminated/rolled, individual TLS connections are destroyed. This leads to clients reconnecting to other instances and flooding one of them with requests (especially re-list and watch requests) instead of distributing requests across other healthy instances.

Ideas

This is not a fully-fledged proposal yet. The issue should serve as a starting point for discussing the problem and ideas. Based on the feedback, we could create a full proposal later on.

We could introduce L7 load balancing for shoot API servers, i.e., multiplexing HTTP/2 streams from a single TLS connection to multiple instances. For this, we would need to terminate TLS connections earlier in the network flow in a proxy before the API server. This proxy could either be the existing istio ingress gateway (global proxy) or an additional proxy per shoot control plane (local proxy). This proxy would open backend connections to all API server instances and multiplex incoming streams over these backend connections.

In addition to presenting the expected server certificate to the client, the proxy would also need to translate L4 authentication (TLS client cert) to L7 authentication information, i.e. put client certificate CN/O into the --requestheader-*-headers headers configured in the API server (similar to https://github.com/envoyproxy/envoy/issues/6601). Envoy already supports the XFCC header for this, but the API server doesn't understand the XFCC header (https://github.com/kubernetes/kubernetes/issues/78252). Probably, Envoy can still be configured to pass information in the format expected by the API server using a wasm plugin.

timebertt commented 7 months ago

Things that still need to be discussed:

Please add your feedback to this issue :)

mwennrich commented 7 months ago

shouldn't the kube-apiserver --goaway-chance flag prevent this?

--goaway-chance float To prevent HTTP/2 clients from getting stuck on a single apiserver, randomly close a connection (GOAWAY). The client's other in-flight requests won't be affected, and the client will reconnect, likely landing on a different apiserver after going through the load balancer again. This argument sets the fraction of requests that will be sent a GOAWAY. Clusters with single apiservers, or which don't use a load balancer, should NOT enable this. Min is 0 (off), Max is .02 (1/50 requests); .001 (1/1000) is a recommended starting point.

timebertt commented 7 months ago

Interesting, I will try this out. So far, I assumed that sending a GOAWAY will cause the client to establish a new TLS connection and send all future requests to a new server (only in-fligth requests, i.e. long running requests like watches will stick). However, this won't distribute concurrent requests across API server instances, but rather randomly jump from one server to the next one in a regular fashion. I might be mistaken though.

gardener-ci-robot commented 4 months ago

The Gardener project currently lacks enough active contributors to adequately respond to all issues. This bot triages issues according to the following rules:

You can:

/lifecycle stale

gardener-ci-robot commented 3 months ago

The Gardener project currently lacks enough active contributors to adequately respond to all issues. This bot triages issues according to the following rules:

You can:

/lifecycle rotten

timebertt commented 3 months ago

/remove-lifecycle rotten

I still want to try out the mentioned API server flag.

gardener-ci-robot commented 2 weeks ago

The Gardener project currently lacks enough active contributors to adequately respond to all issues. This bot triages issues according to the following rules:

You can:

/lifecycle stale