New CAPI cluster id / name validations

alex-dabija commented 2 years ago

Story

As a cluster admin, I want to have longer cluster and machine pool identifiers in order to have meaningful names for my clusters / machine pools, to quickly understand their use and role.

New restrictions on cluster / machine pool ids / names:

it can't start with a number because of restrictions in Cluster API (CAPI);
max length of 10;

These restrictions must be applied to cluster, machine pool and machine deployment CRs.

Background

Giant Swarm clusters have the following restrictions for cluster ids (names):

5 characters max;
customers use longer names;
only enforced in Happa and not in other clients.

In a recent KaaS Sync meeting we agreed to relax the restrictions for ids and names.

Changes

[ ] Add common Kyverno rule for all providers to implement the new restrictions for cluster, machine pool and machine deployment CRs;

marians commented 2 years ago

I think we have not yet discussed whether our defaulting (generated identifiers in the case the user doesn't provide them) should also change according to these new rules.

My opinion: we can continue creating default identifiers compatible with our current rule set (5 chars, numbers and letters etc.). No need to use the full permitted length. In case this is not happening yet, we should make sure to avoid the leading digit.

alex-dabija commented 2 years ago

From KaaS Sync:

@puja108: would it be possible to update the CRDs and use schema validation instead? @fiunchinho: not really, we want to sync the CRDs from upstream.

pipo02mix commented 2 years ago

The cluster 3s2ib in helios has a patched for the app operator that should be reverted once this is fixed

cornelius-keller commented 2 years ago

TODO for @giantswarm/team-rocket: Check if clusters with 10 characters long names work.

@alex-dabija @marians who will implement a generic kyverno policy to check this for CAPI clusters?

alex-dabija commented 2 years ago

@alex-dabija @marians who will implement a generic kyverno policy to check this for CAPI clusters?

We agreed that's either Phoenix or Rocket. We would have to decide between us.

gawertm commented 1 year ago

@alex-dabija is there any progress on this one? is it even still relevant? based on this comment I feel it snot needed? https://github.com/giantswarm/roadmap/issues/1444#issuecomment-1269323554

marians commented 1 year ago

IMO still relevant

alex-dabija commented 1 year ago

Yes, this is still relevant because we want to have long cluster names by default without needing a special flag.

mogottsch commented 1 year ago

For clarification, for this issue to be done we need

kyverno policies that check
- .metadata.name in clusters for the new restrictions
- .spec.clustername in machinedeployments for the new restrictions
- .spec.clustername in machine pools for the new restrictions

Is that correct? @alex-dabija

marians commented 1 year ago

@mogottsch In my opinion, a policy on .spec.clustername is not needed. The logical requirement for that one is to reference an existing cluster name. That should be enough.

mogottsch commented 1 year ago

I agree. The original issue mentioned "machine pool and machine deployment", that's why I also looked at those CRs. I'm not sure whether I should check anything on them, too.

marians commented 1 year ago

It does make sense to have the policy restrict .metadata.name on machine pool and machine deployment :)

mogottsch commented 1 year ago

Ok, are we talking about the cluster name that is usually contained in the name of the machine deployment and machine pool? E.g. the cifqkz7w5s in

cifqkz7w5s-nodepool0
cifqkz7w5s-bastion

Or are we talking about the whole name? Because the whole name seems to be longer than 10 characters very often.

marians commented 1 year ago

I was not aware that we include the cluster name in the machine pool or machine deployment metadata.name value. Just saw it happening in CAPA. I'd like to understand where this is coming from and will ask some questions.

marians commented 1 year ago

This prefixing of node pool names with the cluster name and a dash is happening both on CAPA and CAPZ (see internal Slack thread). The reason: it is then easier to associate resources (e. g. in the infra provider) with the cluster.

As we know, we want cluster names to be up to 10 characters long.

For our naming restrictions: If a node pool name (as specified by the user in the cluster-app config values) can also be up to 10 characters long, this means that <CLUSTER>-<NODEPOOL> will be up to 21 characters.

mogottsch commented 1 year ago

Ok, thanks for investigating!

mogottsch commented 1 year ago

Some customers use machine deployments that start with a number. This resulted in problems when we released the new policies. We, therefore, reverted the PR. Related slack thread

@paurosello asked whether the number-related rule is necessary for machine pools/machine deployments. I am not sure where the original requirements came from.

gawertm commented 1 year ago

ok thx

mogottsch commented 1 year ago

@alex-dabija Since you created the issue, could you give a hint where these restrictions on the names come from (especially the names should not start with numbers one)?

If we want to keep that rule, we will have to check each installation for problematic machine deployments/machine pools and ask customers to migrate to new names.

marians commented 1 year ago

I am pretty sure that our naming rule "must not start with number" (or dash, actually) originated in the assumption that we want to use the exact name for Kubernetes resources.

As https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#dns-subdomain-names explains, most Kubernetes resource names are restricted by the rules of RFC 1123 (DNS Subdomains). These require an alphanumeric character ([a-z]) to be in the first position.

However, for node pools (machinedeployments or machinepools, depending on implementation) this seems to be no longer relevant, as we prefix the resource names with the cluster name and a dash anyway.

mogottsch commented 1 year ago

Thanks for the explanation @marians

One thing to remark: alphanumeric characters can be numbers. But on the same page, there is also a reference to another RFC that only allows alphabetic characters in the first position, so I get the point.

In that case, I suggest that we drop the rules, that forbid machine pools and machine deployments to start with a number. That way we do assure that machine deployments and node pools do not start with a number in vintage, but we do implicitly assure it in CAPI. Therefore, we will reach our naming conventions at some point, without the immediate migration effort.

marians commented 1 year ago

Sounds good to me. Don't see me as the gatekeeper on this one. From a UX point of view, leading numbers are not an issue.

mogottsch commented 1 year ago

There is currently a machine pool in goat that violates the rule that restricts names to 21 characters.

Since replacing it is not a straightforward task, Phoenix will handle it, and we will postpone the release of the new policies until that is completed.

gawertm commented 1 year ago

so we keep this one in blocked until then?

mogottsch commented 1 year ago

Yes.

mogottsch commented 1 year ago

The machine pool was renamed. This is not blocked anymore.

mogottsch commented 1 year ago

After another investigation, we found that too many vintage installations violate these policies. Therefore, we decided to only roll out the policies for CAPI installations. The requirement for these policies originates from CAPI constraints, therefore we are OK with not having these in vintage.

mogottsch commented 1 year ago

One of our CAPI test installations also has a cluster that violates this policy. Context: https://gigantic.slack.com/archives/C02GDJJ68Q1/p1695214416034399 Before releasing, I will add a PolicyException for that cluster, as we won't rename it right now and it is only used for testing.

giantswarm / roadmap