gardener / gardener

Homogeneous Kubernetes clusters at scale on any infrastructure using hosted control planes.
https://gardener.cloud
Apache License 2.0
2.8k stars 462 forks source link

Autonomous Shoot Clusters #2906

Open vlerenc opened 3 years ago

vlerenc commented 3 years ago

Autonomous Shoot Clusters

Gardener became a great tool to create and manage clusters with very low TCO. Part of this success is its design and running control planes in seed clusters under joint supervision by Gardener and Kubernetes. However, there is sufficient pull to also find a way to create autonomous shoot clusters using Gardener (e.g. for the edge or for air-gapped scenarios) where the control plane must run side-by-side and cannot run in seed clusters. This BLI caters to this demand.

Definition

Autonomous shoot clusters do not have their control plane in a seed cluster, but operate it on dedicated control plane nodes in the same network and alongside with the worker nodes, which makes these clusters autonomous (hence the name).

Related or Similar Terms, Term Disambiguation

Autonomous shoot clusters are sometimes confused with untethered shoot clusters, but there is no direct relationship. Tethering and untethering are terms that were introduced to describe whether or not a shoot cluster is managed (at the moment) by Gardener. If it's tethered, life cycle management operations such as updating its spec/Kubernetes or operating system versions, are possible. A tethered shoot cluster would appear under Gardener's "single pane of glass" (so to speak). An untethered shoot cluster, sometimes also called "air-gapped" shoot cluster, is temporarily not managed by Gardener. Today, management cannot be "turned off", but in the future, this could become a deliberate decision to avoid any interference (e.g. while a mission-critical operation is ongoing on customer side/site). In that sense, also regular shoot clusters can be untethered, but that usually happens only in an emergency today, e.g. if the seed cluster lost network connectivity to the garden cluster. However, the term or rather state is therefore not unknown to Gardener (part of our resilience tests). Tethering and untethering become now dedicated terms for the actions that lead to these states. While untethered shoot clusters will be more commonplace with autonomous shoot clusters, those clusters do not have to be untethered, certainly not at all times, and so these terms are not synonymous.

Autonomous shoot clusters are sometimes confused with bare metal/VM shoot clusters, but there is no direct relationship either. You could have bare metal/VM nodes joining also a regular shoot cluster control plane (exclusively or in combination with managed worker nodes; see Gardener Slack channel). So those two things, autonomous shoot clusters and bare metal/VM shoot clusters, are not synonymous. Of course, the means to let bare metal/VM nodes join a cluster will probably become a valuable building block, making autonomous shoot clusters more valuable/opening the use case further up.

Autonomous shoot clusters are sometimes confused with on-prem shoot clusters, but there is no direct relationship either. You could run an on-prem Gardener managing regular shoot clusters on OpenStack or vSphere while on the other hand you could have autonomous shoot clusters on fully managed cloud providers like AWS, Azure, or GCP to be independent from a seed cluster (e.g. to avoid the network traffic, firewall the entire cluster off, avoid runtime dependencies, fulfill compliance obligations, etc.). So those two things, autonomous shoot clusters and on-prem shoot clusters, are not synonymous. Of course, for those customers who do not want to host their own Gardener, but have clusters on-prem, autonomous shoot clusters will become an option and appear synonymous to them, if they don't want to run the control planes managed by us in the cloud (for security and compliance reasons) or simply cannot run (for technical reasons such as network connectivity) the control planes on a remote seed cluster and need it side-by-side with their worker nodes.

Autonomous shoot clusters were also called masterful shoot clusters (in contrast to today's masterless shoot clusters), but this term is no longer political correct.

Why (do we do this)

We want to establish Gardener in (new) environments where clusters cannot run their control plane "somewhere else", but need it side-by-side with their worker nodes, e.g. to avoid the network traffic, firewall the entire cluster off, avoid runtime dependencies, fulfill compliance obligations, etc.

We do not plan to make this a drop-in replacement for k3s or kubeadm (that would be too far away from Gardener's mission statement and inception goals and current implementation), but want to offer this new type of shoot clusters as a separate flavor to reach on-prem use cases that cannot be served with Gardener today.

On-Premise Use Cases

All companies have their own "Enterprise IT" that leverages the cloud or their own DCs/partner DCs or both. Sometimes, companies also have a less experienced "Plant IT", helping with the shop floor IT (anything from small racks to NUCs, mostly carrying out orders or install pre-fabricated packages issued by "Enterprise IT"). These companies use managed Kubernetes services in the cloud, but also Kubernetes distros such as Rancher and OpenShift on-premise, managed by their "Enterprise/Plant IT".

Gardener supports the cloud use case since its inception (zero touch). With autonomous shoot clusters, Gardener likes to establish itself as an option for on-premise use cases as well, such as:

Note: IronCore is not considered bare metal/VM in the context of autonomous shoot clusters (even if IronCore runs on bare metal/VM nodes, but so does also AWS, Azure, and GCP). IronCore is indistinguishable from any other cloud provider from Gardener's point of view. It offers programmable infrastructure components such as networks and virtual machines just like any other cloud provider for Gardener and therefore doesn't need any special handling. Also, we cannot assume to find IronCore at all companies. Bare metal/VM nodes on the other hand are ubiquitous.

Gardener Runtime/Soil Cluster Replacement

Replacing our runtime and soil clusters with our technology has been the "original goal" to avoid the dependency to another Kubernetes service or distro and indeed, it would make Gardener truly independent of others. However, on a more rational level, this by itself is no sufficient reason for this (complex) undertaking. Only few run Gardener installations as they serve the purpose to provision thousands of clusters and those who do can either use an existing managed Kubernetes service, a distro like k3s, or basic tools such as kubeadm (they have the expertise). Also, Gardener will never become a drop-in replacement for k3s or kubeadm (that would be too far away from Gardener's mission statement and inception goals and current implementation).

Bottom line is, to replace the runtime/soil clusters, autonomous shoot clusters are not necessarily needed, because we do not really need to replace them. That means, we should probably avoid to implement a form of autonomous shoot cluster that is totally independent of any running Gardener anywhere as that's probably a lot more difficult to achieve than relying on an existing Gardener. If autonomous shoot clusters can be created from an existing Gardener, in the cloud/on premise/hosted from a notebook kind cluster or whatever, this can and will help us later to either pivot an existing Gardener to this cluster or deploy there a new one and then, in both cases, tether (=claim) said cluster from the Gardener that runs on it, so that it becomes self-hosted. Soil clusters are an even easier goal and will be straight-forward, if we can create (and manage) autonomous shoot clusters from an existing Gardener.

How (do we do this)

In order to compete with already established tools or solutions, it must be easy to set up autonomous shoots clusters or we will never be considered.

Prerequisites, Assumptions, Principles

After having discussed the why, we can now define the boundary conditions and scope. For instance, it will probably be a lot harder to bring up a Gardener autonomous shoot cluster without a Gardener. If that would become a goal (later), we can imagine to map this problem to a local Gardener, e.g. running on a notebook kind cluster. Of course, it's not as slim as a single binary installer, but such an installer would probably take us a lot farther away from our current Gardener code and thereby become a separate piece of code that has only little resemblance with the Gardener we know today (we'd like to avoid a situation that others have been facing, e.g. GKE and GKE-on-prem or Kubermatic and KubeOne being totally different things).

That said, these are our prerequisites, assumptions, and principles (trying to avoid a complete rewrite of Gardener, playing nicely together with Gardener and its architecture):

Idea

The idea is to transform the shoot cluster into its own seed cluster and host the gardenlet, required extensions, and the control plane pods on dedicated control plane nodes managed by MCM. To survive a complete shutdown/reboot of all nodes, critical components to reestablish the control plane (such as ETCD, KAPI, KCM, KSCH, etc.) would be brought up as static pods, so that the control plane always comes back up and with it everything else that is scheduled as regular pods. As long as 2 control plane nodes are up or can come up, even extension/MCM-based self-healing will also be possible (e.g. provisioning replacement control plane nodes). If more/all is lost, it should be possible to repair the cluster in a similar way as it was created initially, but with an ETCD backup to boot.

Alternatively, we bring out the entire control plane as static pods, if using the shoot cluster control plane for some control plane pods itself isn’t really making life simpler (more changes for the gardenlet, but maybe only the flow – still, it would be definitely double-effort).

Flow

Proof of Concept

To test the waters/experiment with the idea, we should "build" a proof of concept, short-cutting our way to the first autonomous shoot cluster in the following way:

What

The execution plan is not yet defined. We are still in early discussions.

However, once those are concluded, we'd like to "build" such an autonomous shoot manually to have a proof of concept that can help us to understand the complexities and thereby inform our next steps, but also to encourage/motivate us, if successful... or stop all discussions around this topic for good, if not successful. See the proof of concept section above.

So far, we see the following (sub) topics that will have to be addressed for autonomous shoot clusters, e.g. starting with them in a hackathon:

In addition, we need to have an early discussion with the ETCD team to discuss the autonomous shoot cluster scenario and how to solve it with the druid, its coming operations, the member resource (replacing the lease resources), the steward, and in general today's dependency to a running seed cluster that the druid cannot rely upon anymore (and neither can it assume the shoot cluster control plane to be up, because it's based on ETCD). This may be a principle problem. Possibly, we need to invent some other synchronization solution via the static ETCD druid/steward pods (w/o Kubernetes as persistence), e.g. via peer-to-peer communication (much like the ETCDs have their own peer-to-peer communication to get themselves going). Besides this question, there is also the general question how to deal with backups and what to offer in case there is no blob store available (internally) or desired (externally), e.g. should we offer to write the backups also to other sinks like a (network) file share, so that the customer can automate the backup process further and we provide instructions for restore (resp. restore from that file share)?

gardener-ci-robot commented 6 days ago

The Gardener project currently lacks enough active contributors to adequately respond to all issues. This bot triages issues according to the following rules:

You can:

/lifecycle stale