Open cah-hbaum opened 1 week ago
e.g. Kamaji, Gardener, etc
This question was answered in Container Call 2024-06-27:
For example, regiocloud supports the Node Failure Tolerance case but not the Zone Failure Tolerance.
Node distribution
and things like High Availability
, Redundancy
, etc.I think to discuss this topic correctly, most of the wording/concepts need to be established first. I'm going to try and find multiple (if different) sources and link them here for different things.
High Availability
The main goal of HA is to avoid downtime, which is the period of time when a system, service, application, cloud service, or feature is either unavailable or not functioning properly.
(https://www.f5.com/glossary/high-availability)
High availability means that an IT system, component, or application can operate at a high level, continuously, without intervention, for a given time period. ...
(https://www.cisco.com/c/en/us/solutions/hybrid-work/what-is-high-availability.html)
High availability means that we eliminate single points of failure so that should one of those components go down, the application or system can continue running as intended. In other words, there will be minimal system downtime — or, in a perfect world, zero downtime — as a result of that failure.
(https://www.mongodb.com/resources/basics/high-availability)
So things termed with High Availability in general try to avoid downtime of their services with the goal of having zero downtime, which is most times not achievable. This can also be seen in this section: ... In fact, this concept is often expressed using a standard known as "five nines," meaning that 99.999% of the time, systems work as expected. This is the (ambitious) desired availability standard that most of us are aiming for. ...
(https://www.mongodb.com/resources/basics/high-availability).
To achieve these goals, services, hardware or networks are most times provided in a redundant setup, which allows automatic fail-over if instances go down.
Redundancy
In engineering and systems theory, redundancy is the intentional duplication of critical components or functions of a system with the goal of increasing reliability of the system...
)https://en.wikipedia.org/wiki/Redundancy_(engineering))
In cloud computing, redundancy refers to the duplication of certain components or functions of a system with the intention of increasing its reliability and availability.
(https://www.economize.cloud/glossary/redundancy)
HINT: WILL BE CONTINUED LATER
I brought this issue up in today's Team Container Call and edited the above sections accordingly. As part of #649 we will also get access to Gardener and soon Kamaji clusters.
One thing I want to make you aware of @cah-hbaum: in the call, it was pointed out that term shared control-plane isn't correct. The control-plane isn't shared, instead, the control-plane nodes are shared and thus we should always say shared control-plane node.
(I edited above texts accordingly as well to refer to shared control-plane nodes.)
Follow-up for https://github.com/SovereignCloudStack/standards/pull/524 The goal is to set the
Node distribution
standard toStable
after all discussion topics are debated and decided and the necessary changes derived from these discussions are integrated into the Standard and its test.The following topics need to be discussed:
Node distribution
and things likeHigh Availability
orRedundancy
? Should this standard only be a precursor for a `High Availability' standard? (more information under #579)etcd
(https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/ha-topology/#external-etcd-topology) be integrated here? (see https://github.com/SovereignCloudStack/standards/pull/524#discussion_r1642540079)