giantswarm / roadmap

Giant Swarm Product Roadmap
https://github.com/orgs/giantswarm/projects/273
Apache License 2.0
3 stars 0 forks source link

Add CLI command to delete a workload cluster #1559

Open marians opened 1 year ago

marians commented 1 year ago

User story

Details

Deleting a cluster is not as straightforward as deleting a Cluster CR, and it varies between product generations and potentially also between providers.

The goal here is to provide one CLI command like kubectl gs delete cluster NAME --namespace NAMESPACE which handles all the internal differentiation.

Tasks

marians commented 1 year ago

@calvix explained cluster deletion for CAPA like this:

# create resource manifests
kubectl gs template cluster --provider=capa --organization giantswarm --name=vac01 > ./cluster-vac01.yaml

# create cluster
kubectl apply -f ./cluster-vac01.yaml

# delete cluster
kubectl delete -f ./cluster-vac01.yaml

The kubectl gs template cluster for CAPI clusters yields four resource manifests: 2 x App, 2 x ConfigMap. To delete the cluster, all four resources have to be deleted.

Based on this thread this is the same in all CAPI providers.

I see a problem with this relying purely on resource naming. Who guarantees that all workload clusters conform with this naming convention? Anyway, this might mean that the implementation could be changed in the future.

Topics to address in the spec:

marians commented 1 year ago

Spec v1

For your feedback

General description

Command to delete workload cluster resources on a management cluster. In terms of the semantics, we provide this command to express that a certain workload cluster should not exists. This means that the command does whatever required to remove resources belonging to this cluster, even if the cluster has not been created completely.

General syntax

kubectl gs delete cluster NAME [FLAGS]

Flags

Example commands

Delete cluster abc12 owned by organization acme, which is expected to be found in namespace org-acme.

$ kubectl gs delete cluster abc12 \
  --organization acme

Delete cluster abc12 namespace org-acme. The same as above, but using --namespace instead of --organization.

$ kubectl gs delete cluster abc12 \
  --namespace org-acme

Cluster info and confirmation prompt

If cluster and node pool resources exist for the cluster to be deleted, information will be shown and a prompt asks for confirmation:

You are about to delete this cluster from
installation gollum (region eu-westeurope-1):

Name:              abc12
Service priority:  HIGHEST
Description:       Prod cluster K8s 1.25
Created:           12 May 2022 - 5 months ago

Node pools:        2
Worker nodes:      12

Do you really want to delete this cluster? There is no undo!
If yes, please type the installation and cluster name,
separated by a slash, and hit Return.
If no, enter anything else or hit Ctrl-C.

> 

TODO: Decide what to show if no cluster resource, or no node pool resource, exists for the cluster.

Behind the scenes

Special cases

marians commented 1 year ago

Comments from refinement in Team Rainbow

We change the prompt like below:

You are about to delete this cluster:

Name:              abc12
Installation:      gollum
Region:            eu-westeurope-1
Service priority:  HIGHEST
Description:       Prod cluster K8s 1.25
Created:           12 May 2022 - 5 months ago

Node pools:        2
Worker nodes:      12

Do you really want to delete this cluster? There is no undo!
If yes, please type the installation and cluster name,
separated by a slash, and hit Return.
If no, enter anything else or hit Ctrl-C.

> gollum abc12

Error: Installation and cluster name do not match the format 'gollum/abc12'.

$> 

Question:

marians commented 1 year ago

Spec v2

For your feedback

Changes compared to Spec v1:

General description

Command to delete workload cluster resources on a management cluster. In terms of the semantics, we provide this command to express that a certain workload cluster should not exists. This means that the command does whatever required to remove resources belonging to this cluster, even if the cluster has not been created completely.

General syntax

kubectl gs delete cluster NAME [FLAGS]

Flags

Example commands

Delete cluster abc12 and find the namespace automatically:

$ kubectl gs delete cluster abc12

Delete cluster abc12, to be found in namespace org-acme:

$ kubectl gs delete cluster abc12 \
  --namespace org-acme

Show what deleting cluster abc12, to be found in namespace org-acme, would mean:

$ kubectl gs delete cluster abc12 \
  --namespace org-acme \
  --dry-run

Cluster info and confirmation prompt

If cluster and node pool resources exist for the cluster to be deleted, information will be shown and a prompt asks for confirmation:

You are about to delete this cluster:

Name:              abc12
Installation:      gollum
Region:            eu-westeurope-1
Service priority:  HIGHEST
Description:       Prod cluster K8s 1.25
Created:           12 May 2022 - 5 months ago

Node pools:        2
Worker nodes:      12

Do you really want to delete this cluster? There is no undo!
If yes, please type the installation and cluster name,
separated by a slash, and hit Return.
If no, enter anything else or hit Ctrl-C.

> gollum abc12

Error: Installation and cluster name do not match the format 'gollum/abc12'.

$> 

TODO: Decide what to show if no cluster resource, or no node pool resource, exists for the cluster.

Dry-run output

The following resources will be deleted directly:

NAMESPACE   KIND       NAME                            API GROUP                  VERSION
org-acme    App        abc12                           application.giantswarm.io  v1alpha1
org-acme    App        abc12-default-apps              application.giantswarm.io  v1alpha1
org-acme    App        abc12-nginx-ingress-controller  application.giantswarm.io  v1alpha1
org-acme    ConfigMap  abc12-userconfig                                           v1
org-acme    ConfigMap  abc12-default-apps-userconfig                              v1

Note: This list only shows resources that will be deleted through kubectl-gs directly.
More resources (cluster, node pools) will be deleted as a consequence, if they exist.

Behind the scenes

Special cases

marians commented 1 year ago

Addition to Spec v2

If a workload cluster is managed through GitOps/Flux, deleting via the CLI command should be prevented, just like it's the case in the web UI.

We will use the existence of the two labels kustomize.toolkit.fluxcd.io/name and kustomize.toolkit.fluxcd.io/namespace on the cluster app resource as an indicator for this.

In this case, the attempt to delete such a cluster should yield the following error message:

Error: cluster 'abc12' is managed through GitOps. It cannot be deleted using this command. Instead, delete the defining resources from the source git repository.
marians commented 1 year ago

Blocked by #1599

gusevda commented 1 year ago

@marians based on this PR description

All apps for a cluster not being managed by cluster-apps-operator are being deleted when cluster CR is deleted.

How the --keep-apps flag is going to work in this case?

marians commented 1 year ago

Talked with Dmitry.

Consider it removed from the spec.

weatherhog commented 2 weeks ago

@gusevda @marians are you okay with closing this one? Or is this still needed even with flux?

marians commented 1 week ago

Still relevant to me