Closed garloff closed 3 months ago
@jschoone @garloff It seems that this existing epic does for the CaaS track what I intended the new epic https://github.com/SovereignCloudStack/standards/issues/285 to do for the IaaS track. I guess it remains to compare the description here with the table https://input.scs.community/tqKlv1Z_Srmi5e5o76CxhQ?view#KaaS-Layer I took from Kurt's slides and maybe update accordingly? For instance, two standards have already been ticked off, even though we still need to implement the conformance tests -- @cah-hbaum will write the corresponding issues, and so I could add those to this epic. Please tell me if disagree to anything I just wrote.
Comparison between this epic and the table from Kurt's ALASCA talk slides
Please check what should be added here or what I did wrong @garloff @jschoone.
TL;DR: I want them all to be considered and discussed. Not all of them necessarily become a mandatory standard. Maybe some of them don't even become a recommendation.
Comparison between this epic and the table from Kurt's ALASCA talk slides
* Present in this epic, but missing in the slides (really? or did I just fail to align them?) * LBs don't require special annotations (upstream nginx deployment works out of the box): Service type LoadBalancer with externalTrafficPolicy: Local needs to work out of the box [Service type LoadBalancer with externalTrafficPolicy: Local needs to work out of the box SovereignCloudStack/issues#212](https://github.com/SovereignCloudStack/issues/issues/212)
The thing here is that nginx upstream uses externalTrafficPolicy: Local
and assumes that
(1) The traffic only is routed to the nodes that run the nginx container - which requires a health monitor to be configured which on many LBs (including the octavia one) requires a special annotation or a changed default
(2) The original client IP is visible and not obscured by the LB -- L2/L3 LB instead of L4 Yet, the occm tends to prefer HTTP L7 health checks ... Discussion here is on SovereignCloudStack/issues#212 and numerous subsequent issues, indeed.
* ControlPlane and Worker machine flavors and counts (translation from SCS flavors needed for non-SCS IaaS?)
For both ControlPlane and Worker Nodes, the number of them and the Flavors need to be configurable. The madatory SCS- Flavors need to be accepted for the latter. (Sidenote: This is a cluster-management feature, not a cluster property -- the latter being something you can rely on once a cluster exists.)
* Present in the slides, but missing in this epic: * CNCF conformance tests (not linked to any issue so far)
We have sonobuoy binary installed on the management cluster and run it to test the workload clusters for CNCF conformance. So we have tooling to test CNCF conformance and we want to require CNCF conformance for all clusters.
* K8s version support period (not linked to any issue so far) * note: "Offered K8s version recency" is present as [Supported k8s versions SovereignCloudStack/issues#219](https://github.com/SovereignCloudStack/issues/issues/219)
We have a standard on this: scs-0210-v1. Maybe we need to amend that providers must not drop support for a minor k8s version earlier than upstream does stop the security support (after ~14 months after a release). And maybe we should recommend that for managed clusters, the provider sends a warning to the users when they have a cluster entering the extended support period (after ~12 months) and align the needed upgrades?
* Identity federation via OIDC, [Understand the requirements towards the IdP Broker to support the container layer SovereignCloudStack/issues#194](https://github.com/SovereignCloudStack/issues/issues/194) * Machine identities, [Implement Machine Identities SovereignCloudStack/issues#163](https://github.com/SovereignCloudStack/issues/issues/163) * Control plane backup/ maintenance, [etcd maintenance k8s-cluster-api-provider#258](https://github.com/SovereignCloudStack/k8s-cluster-api-provider/issues/258) * Kube API access controls, [Add ability to limit access to k8s API k8s-cluster-api-provider#246](https://github.com/SovereignCloudStack/k8s-cluster-api-provider/issues/246) * Container registry (opt-in), [Container registry: Create overview of needed and desirable features and map OSS solutions against it. SovereignCloudStack/issues#263](https://github.com/SovereignCloudStack/issues/issues/263) * Cluster management API, [SCS K8s cluster standardization SovereignCloudStack/issues#181](https://github.com/SovereignCloudStack/issues/issues/181) * Gitops controller for Cluster Mmgt (not linked to any issue so far)
We had some concepts written down for this -- and determined that this should be optional (for the customer). This should become a requirement to the to-be-developed cluster stacks: Have the ability for the cluster-parameters to be pulled from a git repo (using tooling like flux or Argo).
Please check what should be added here or what I did wrong @garloff @jschoone.
I did not check these for completeness, but everything above looks desirable to me.
Note: I believe we have two kind of standards here: (1) What are the properties of the created clusters?
(2) What is the standardized parameter format and API to create, modify and delete clusters?
@garloff I amended the description of this issue by everything that hadn't been in there. Maybe we can now go ahead and group the items a bit, like I did in https://github.com/SovereignCloudStack/standards/issues/285.
I updated the epic and grouped everything a bit more together. But I think in the long run, something like a table would be better, since the "pre"-work for the standard issues is done in other issues or over multiple ones. I can make a table here, so that the whole thing gets grouped better, if that is desired.
I created individual issues for nearly all points not yet covered by previous issues. I left a few open, since the seemed way too general and broad.
@cah-hbaum That sounds great! I also like the new structure in the description above. 👍👍👍
[ ] SovereignCloudStack/issues#421
[ ] SovereignCloudStack/issues#224
[ ] SovereignCloudStack/issues#194
[ ] SovereignCloudStack/issues#163
[ ] SovereignCloudStack/issues#417
[ ] SovereignCloudStack/issues#386
[ ] SovereignCloudStack/issues#214
[ ] SovereignCloudStack/issues#434
[ ] SovereignCloudStack/issues#212
LoadBalancer
with externalTrafficPolicy: Local
needs to work out of the box +Closing in favor of https://github.com/SovereignCloudStack/standards/issues/615.
As DevOps team (=SCS user), I want to have the ability to create and use clusters on many different SCS-compliant container providers, where all relevant properties are either predefined by the SCS standard or can be controlled by a provider-independent cluster-settings.yaml file. Relevant properties are those that tend to create trouble for the application deployment, e.g. k8s versions, CNI features, persistent volumes, ingress/load-balancers, anti-affinity rules (avoiding to have k8s nodes on the same host) ...
These properties should either be fixed by SCS (and then of course only evolve slowly over time) or be controllable by the customer (via a standardized, provider-independent
cluster-params.yaml
. For the controllable properties, we mandate existence and syntax and we may mandate all or some of the supported options. In any case, the supported options need to be discoverable (and the mechanism for discoverability should include the fixed properties as well).Note that there is value in standardizing things that are not mandatory, in order for providers to use the same name/semantics for same things. (Obviously optional features may become mandatory for providers in the future if we decide so.)
Hints:
Extensibility: We allow for extensions, but they must be clearly distinguishable from standardized properties.
This epic should list the standardization proposals / ADRs as issues that we as SCS community want to define as SCS-compliant relevant. Some of the proposals might not make it for a v1 of the SCS standard (because they are not ready or deemed not important enough or downgraded to recommendations). The individual proposed properties / ADRs should come with a rationale and with (ideally comprehensive) conformance tests. We want to evolve the reference implementation(s) in parallel to the standardization, but intellectually keep a clear distinction b/w standards and implementation.
We need to create conformance tests for these properties; it is useful to define standards in terms of tests that must pass. (Test-driven standardization!) Obviously, using existing test suites (such as CNCF/sonobouy or aqua/kube-bench) and possibly contributing to them is a good way to do this.
Inspiration for the list below:
Individual topics for standardization:
Networking
[ ] Standardize k8s networking policies (CNI)
[ ] Service type LoadBalancer with externalTrafficPolicy: Local
LoadBalancer
withexternalTrafficPolicy: Local
needs to work out of the box[ ] Ingress Support (OPTIONAL)
Container Registry
[x] Container registry feature overview
[ ] Registry Standard from DR SCS-0212
Meta
[x] Supported k8s versions
[x] K8s version support period
[ ] KaaS ControlPlane/worker machine flavors
[ ] Cluster management API
Automation
[ ] KaaS Cluster Management Gitops Controller
[ ] KaaS Gitops/CI tooling
Identity Management
[ ] Understand the requirements towards the IdP Broker to support the container layer
[ ] Implement Machine Identities
[ ] KaaS IAM federation with ID broker
Logging & Metrics
[ ] Metrics server support (OPT-OUT)(OPTIONAL)
[ ] Logging/Monitoring/Tracing features? (OPTIONAL)
Security & Robustness
[ ] Forwarding-porting and retesting of upstream intel patchset for SGX and OpenStack
[ ] ~K8s cluster baseline security setup~ K8s cluster hardening
[ ] Move Keycloak onto kubernetes powered runtime on management plane
[ ] KaaS Optional Cert-Manager
[ ] Distributed K8s nodes to ensure Anti-Affinity
[ ] KaaS Robustness features
Storage
Tests
Definition of Done: