Availability Zones: standardized levels of independecies.

josephineSei commented 3 months ago

Availability Zones are a concept in OpenStack.

As a user of an scs-conformal cloud I want to know what i can expect from AZs overall and what is dependent on the CSP.

Definition of Done:

Please refer to scs-0001-v1 for details.

[ ] Proposal has been written with name of the form scs-xxxx-v1-slug.md (only substitute slug)
[ ] Proposal has the fields status, type, track set
[ ] Proposal has been voted upon in the corresponding team
[ ] Status has been changed into Draft, file renamed: xxxx replaced by document number
[ ] If applicable: test script has been written (this item may be moved into a separate issue so long as the state is Draft)

josephineSei commented 3 months ago

Why are we talking about AZs: AZs focus on redundancy and failure safety on IaaS-Level.

While redundancy at the lowest level could be just something like having replication in the storage backend, so there is no data loss in the case of a hardware failure, the requirements can be as hard as having a remote mirror of all data.

AI: We should at least document, what different levels of redundancy means and what failure safety different deployments can provide (@anjastrunk maybe the latter one would be something for gaia-x self-descriptions?)

Pre-Requirements

To also allow having small deployments or edge deployments, that usually only have 1 single AZ, we must not require a certain amount of AZs. Redundancy and Failure safety in that case should be done on the next higher level (PaaS, CaaS, workload...) by the user.

We should rather define and check, for when AZs can be defined and used.

What can AZs be defined of / What can they separate

AZs are implemented in various ways for different resources, as those resources belong to different OpenStack services.
There are AZs for Nova (Compute), Cinder (Block Storage) and Neutron (Networking)

AZs are logical separations with a chance of physical separation. AZs can be defined:

for Nodes with different Power Supplies
for nodes in different fire-zones, separated by strong fire safety mechanism (thinking about a whole deployment that burns down)
splitting AZs rack-based (having the top-of-rack switch or powersupply as single point of failure)
having a chance for planned maintenance (e.g. upgrade one AZ after another, and telling customers to "just" switch AZs) -> maybe with rolling upgrades that is not common anymore
to distinguish between backends (KVM vs another hypervisor). For storage there is always the possibility to use different volume types, so this is mainly applicable to Compute
for security reasons: either one AZ for one customer or one AZ for one physical node for tenant separation on hypervisors (there are other options and maybe better ways to accomplish this)

Problems:

users need to understand whats the difference between two AZs, but they do not have the knowledge of the underlying infrastructure, the capacity of the AZs or how full (of VMs or Volumes) one AZ is right now. So in many cases user might just guess or take some default.

Restrictions:

in Nova a physical host can only by mapped to one AZ (or maybe the compute service running on it)
the Nova config option cross_az_attach allows or disallows attachment of volumes from other AZs.
in Cinder volume services are mapped to AZs
most Cinder backends already have built-in redundancy which makes having Cinder AZs dispensable
in Cinder and Neutron AZs are hardcoded in config ( https://docs.openstack.org/neutron/latest/admin/config-az.html#availability-zone-of-agents and https://docs.openstack.org/cinder/latest/configuration/block-storage/config-options.html), if not used Cinder automatically defaults to nova

A good but a bit outdated overview was presented at the Summit in 2018 ( https://www.youtube.com/watch?v=a5332_Ew9JA )

Proposal:

[ ] The SCS should not require any AZs
[ ] The SCS should gather information from all CSPs about their usage of AZs
[ ] A DR should be written to define what types of AZ are recognized by the SCS (What do we want to achieve when having AZs)
[ ] The DR should also include which services of OpenStack having AZs are recognized (e.g. There is little value in having Volume AZs when most backends already provide redundancy)
[ ] Levels of Redundancy / Failure Safety should be defined at a very high-level point of view for IaaS and maybe other Layers too

josephineSei commented 3 months ago

I created a hedgedoc for CSPs to talk about their AZ usage: https://input.scs.community/Availability-Zone-Usage#

josephineSei commented 2 months ago

Up until now, there was not much input - so I put it on the agenda again for the next IaaS call

josephineSei commented 1 month ago

A few CSPs answered the questions in the hedgedoc, so we can go on with the work on AZs. There was also a proposal as what to use in the hedgedoc.

The problem here is, when in a deployment AZs are used differently those deployment might not be changed, because change the AZ-architecture is quite fundamental. So all other deployments would be automatically rendered scs-incompatible.

Another option is to use the failsafe levels that will be defined in https://github.com/SovereignCloudStack/standards/issues/527, this would be more vague - we should discuss, whether we want this or not.

josephineSei commented 3 weeks ago

We should base our standard on the Taxonomy DR (https://github.com/SovereignCloudStack/standards/pull/579) and start a requirement analysis from that levels.

Additionally the answers from CSPs in the hedgedoc are quite helpful, as they already proposed some minimum setup for AZs:

AZ definition

    AZs must be in separate fire protection zones
    AZs must have independent power supplies
    AZs must have independent cooling
    AZs should have independent uplinks to the internet
        plusserer: must
    AZs must not depend on a single core router
    AZs must have high bandwidth, low-latency (<3ms RTT) interconnection

We do have to also consider that AZs exit for Compute, Network and Storage independently. And some of them might be easily mitigated by configuration (storage) or are not easily manageable in openstack:

    Compute hosts are always per AZ (in multi-AZ setups)
    Block storage may be global (=per-region) and setup such that it survives AZ failure (preferred option) OR may use the same AZs as compute
    Network service should NOT be per AZ
        To be discussed
        it is a major inconvenience for users to have per-AZ networks
        As network AZ hints are not ignored (despite the name “hint”) in a single-AZ setup, there is no reasonable way to define IaC setups (e.g. with opentofu HCL) that work on both setups with and without network AZs

Requirements

AZs should represent parts of the same deployment, that have an independency of each other
AZs should be able to take workload from another AZ in a Failure Case of Level 3 (in other words: the destruction of one AZ will not automatically include destruction of the other AZs)
- Compute: resources are bound to one AZ, replication cannot be guaranteed, downtime or loss of resources is most likely
- Storage: highly depended on storage configuration, replication even over different AZs is part of some storage backends
- Network: network resources are also stored as configuration pattern in the DB and could be materialized in other parts of a deployment easily as long as the DB is still available.
We should not require AZs to be present (== allow small deployments and edge use cases)

Decisions

AZs should only occur within the same deployment and have an interconnection that represents that (we should not require specific numbers in bandwidth and latency.)
We should separate between AZs for different resources (Compute, Storage, Network)
- Compute needs AZs (because VMs may be single point of failure) if failure case 3 may occur (part of the deployment is destroyed, if the deployment is small there will be no failure case three, as the whole deployment will be destroyed)
- Storage should either be replicated over different zones (e.g. fire zones) that are equivalent to compute AZs or also use AZs
- Network do not need AZs
Power supply may be confused with power line in. Maybe a PDU is what we should talk about - those need to exist for each AZ independently.
When we define fire zone == compute AZ, then every AZ of course has to fulfill the guidelines for a single fire zone. Maybe this should be stated implicitly rather than explicitly.
internet uplinks: after the destruction of one AZ, uplink to the internet must still be possible (that can be done without requiring a separate uplinks for each AZ.)
each AZ should be designed with minimal single point of failures (e.g. single core router) to avoid a situation where a failure of class 2 will disable a whole AZ and so lead to a failure of class 3.

josephineSei commented 6 days ago

In todays IaaS call, we discussed a few open questions:

Network AZ

In the standard I discussed, that it is possible to have Network AZ, but this has downsides for users. Thus i did not make any recommendations. We discussed, whether we even want to discourage CSPs to use it ("SHOULD NOT"):

it has been brought up that it is hard to configure and not nice to use for users
@garloff: discourage or even forbid usage of network AZs
@berendt: should not be forbidden, there are use cases
These are really not nice for users, we should discourage it (but not disallow)
- ToDo: Ask for more use cases, maybe we can not even discourage

Cross-Attach AZ

Question was, whether we want to encourgage / allow / discourage or disallow this?

so far, nearly no CSP uses this according to Hedgedoc input
@garloff: unlike for network it is not obvious that I can attach volumes from other AZs
when using Ceph, you'd normally have a global cross-AZ for storage (but not several storage AZs)
if not using Ceph, implementation would be hard, we should not request this from CSPs
- Use-case wavecon: Local dedicated (per AZ) ceph clusters, no support for x-attaching
@artificial-intelligence: X-attach would negatively impact isolation between AZs (and performance)
Maybe transparency is the most important feature here?
important to distinguish between replicating storage between AZs vs. cross-attaching volumes across AZs

Overall

We can not define all kinds of details how DCs should be built for highest availability
Reference DC taxonomies / BSI taxonomy for this
SCS can be useful by providing some minimal bounds that allows uses to have meaningfully higher chance to survive by spreading over several AZs
Highest level of redundancy will always be achieved by replicating data over several regions
- Can we define something with "AZ"s that's better than nothing (though never as good as regions)?

SovereignCloudStack / standards