Closed calvix closed 1 year ago
Upstream issue: https://github.com/kubernetes-sigs/cluster-api/issues/7559 General consensus is it's an ordering issue in the way things are deleted. I need to dig into CAPA and CAPI code to work out which isn't waiting on the other to complete the deletion before moving on to the next resource.
I was hoping we could work around the issue using the deletion-blocker-operator
but that only prevents the CR from being removed. There's no way to prevent CAPA from performing the infrastructure tear-down once the delete is requested.
MachinePool
deletion now handled by capi-garbage-collector
@nprokopic I assigned this to you because you are currently working on fixing the deletion problem with the VPC, subnets and route tables.
I tested cluster deletion on golem
and it seems that the cluster is stuck. The AWS resources were deleted, but the AWSCluster resource is still present with the aws-vpc-operator
finalizer.
Cluster definition:
---
apiVersion: v1
data:
values: |
aws:
region: eu-west-2
bastion:
enabled: false
proxy:
enabled: true
http_proxy: "http://internal-a1c90e5331e124481a14fb7ad80ae8eb-1778512673.eu-west-2.elb.amazonaws.com:4000"
https_proxy: "http://internal-a1c90e5331e124481a14fb7ad80ae8eb-1778512673.eu-west-2.elb.amazonaws.com:4000"
no_proxy: "test-domain.com"
clusterName: alextest21
controlPlane:
replicas: 3
machinePools:
- instanceType: m5.xlarge
maxSize: 10
minSize: 3
name: machine-pool0
rootVolumeSizeGB: 300
availabilityZones:
- eu-west-2a
- eu-west-2b
- eu-west-2c
network:
vpcCIDR: 10.20.0.0/16
topologyMode: GiantSwarmManaged
availabilityZoneUsageLimit: 3
vpcMode: private
apiMode: private
dnsMode: private
subnets:
- cidrBlock: 10.20.0.0/18
- cidrBlock: 10.20.64.0/18
- cidrBlock: 10.20.128.0/18
organization: giantswarm
kind: ConfigMap
metadata:
creationTimestamp: null
labels:
giantswarm.io/cluster: alextest21
name: alextest21-userconfig
namespace: org-giantswarm
---
apiVersion: application.giantswarm.io/v1alpha1
kind: App
metadata:
labels:
app-operator.giantswarm.io/version: 0.0.0
name: alextest21
namespace: org-giantswarm
spec:
catalog: cluster
config:
configMap:
name: ""
namespace: ""
secret:
name: ""
namespace: ""
kubeConfig:
context:
name: ""
inCluster: true
secret:
name: ""
namespace: ""
name: cluster-aws
namespace: org-giantswarm
userConfig:
configMap:
name: alextest21-userconfig
namespace: org-giantswarm
version: 0.18.0
---
apiVersion: v1
data:
values: |
clusterName: alextest21
organization: giantswarm
kind: ConfigMap
metadata:
creationTimestamp: null
labels:
giantswarm.io/cluster: alextest21
name: alextest21-default-apps-userconfig
namespace: org-giantswarm
---
apiVersion: application.giantswarm.io/v1alpha1
kind: App
metadata:
labels:
app-operator.giantswarm.io/version: 0.0.0
giantswarm.io/cluster: alextest21
giantswarm.io/managed-by: cluster
name: alextest21-default-apps
namespace: org-giantswarm
spec:
catalog: cluster
config:
configMap:
name: alextest21-cluster-values
namespace: org-giantswarm
secret:
name: ""
namespace: ""
kubeConfig:
context:
name: ""
inCluster: true
secret:
name: ""
namespace: ""
name: default-apps-aws
namespace: org-giantswarm
userConfig:
configMap:
name: alextest21-default-apps-userconfig
namespace: org-giantswarm
version: 0.11.0
Cluster status:
❯ kubectl tree --context gs-golem -n org-giantswarm cluster alextest21
NAMESPACE NAME READY REASON AGE
org-giantswarm Cluster/alextest21 False Deleted 62m
org-giantswarm └─AWSCluster/alextest21 False Deleted 62m
AWSCluster definition:
❯ kubectl -n org-giantswarm get awscluster alextest21 -o yaml
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSCluster
metadata:
annotations:
aws.cluster.x-k8s.io/external-resource-gc: "true"
aws.giantswarm.io/dns-assign-additional-vpc: ""
aws.giantswarm.io/dns-mode: private
aws.giantswarm.io/vpc-mode: private
meta.helm.sh/release-name: alextest21
meta.helm.sh/release-namespace: org-giantswarm
creationTimestamp: "2022-11-29T13:44:56Z"
deletionGracePeriodSeconds: 0
deletionTimestamp: "2022-11-29T14:13:47Z"
finalizers:
- aws-vpc-operator.finalizers.giantswarm.io
generation: 7
labels:
app: cluster-aws
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/version: 0.18.0
application.giantswarm.io/team: hydra
cluster.x-k8s.io/cluster-name: alextest21
cluster.x-k8s.io/watch-filter: capi
giantswarm.io/cluster: alextest21
giantswarm.io/organization: giantswarm
helm.sh/chart: cluster-aws-0.18.0
release.giantswarm.io/version: 20.0.0-alpha1
name: alextest21
namespace: org-giantswarm
ownerReferences:
- apiVersion: cluster.x-k8s.io/v1beta1
blockOwnerDeletion: true
controller: true
kind: Cluster
name: alextest21
uid: 09f86b82-64a3-40a8-bbeb-61a49e53f9ab
resourceVersion: "63496498"
uid: e8ba6fa0-9f17-449f-9eeb-dd8a9147dd21
spec:
bastion:
allowedCIDRBlocks:
- 0.0.0.0/0
enabled: false
controlPlaneEndpoint:
host: internal-alextest21-apiserver-319032799.eu-west-2.elb.amazonaws.com
port: 6443
controlPlaneLoadBalancer:
crossZoneLoadBalancing: false
scheme: internal
identityRef:
kind: AWSClusterRoleIdentity
name: default
network:
cni:
cniIngressRules:
- description: allow AWS CNI traffic across nodes and control plane
fromPort: -1
protocol: "-1"
toPort: -1
subnets:
- availabilityZone: eu-west-2a
cidrBlock: 10.20.0.0/18
id: subnet-03d44fb2d6dab1563
isPublic: false
tags:
Name: alextest21-subnet-private-eu-west-2a
github.com/giantswarm/aws-vpc-operator/role: private
kubernetes.io/cluster/alextest21: shared
kubernetes.io/role/internal-elb: "1"
- availabilityZone: eu-west-2b
cidrBlock: 10.20.64.0/18
id: subnet-0878489d46a84cd0a
isPublic: false
tags:
Name: alextest21-subnet-private-eu-west-2b
github.com/giantswarm/aws-vpc-operator/role: private
kubernetes.io/cluster/alextest21: shared
kubernetes.io/role/internal-elb: "1"
- availabilityZone: eu-west-2c
cidrBlock: 10.20.128.0/18
id: subnet-0ac7dfc2dd31c7a76
isPublic: false
tags:
Name: alextest21-subnet-private-eu-west-2c
github.com/giantswarm/aws-vpc-operator/role: private
kubernetes.io/cluster/alextest21: shared
kubernetes.io/role/internal-elb: "1"
vpc:
availabilityZoneSelection: Ordered
availabilityZoneUsageLimit: 3
cidrBlock: 10.20.0.0/16
id: vpc-0ddc8253300a71fdc
tags:
Name: alextest21-vpc
github.com/giantswarm/aws-vpc-operator/role: common
region: eu-west-2
sshKeyName: ssh-key
status:
conditions:
- lastTransitionTime: "2022-11-29T14:16:25Z"
message: 0 of 4 completed
reason: Deleted
severity: Info
status: "False"
type: Ready
- lastTransitionTime: "2022-11-29T14:16:23Z"
reason: Deleted
severity: Info
status: "False"
type: ClusterSecurityGroupsReady
- lastTransitionTime: "2022-11-29T13:45:07Z"
status: "True"
type: DNSZoneReady
- lastTransitionTime: "2022-11-29T14:16:25Z"
reason: Deleted
severity: Info
status: "False"
type: InternetGatewayReady
- lastTransitionTime: "2022-11-29T14:49:30Z"
reason: Deleted
severity: Info
status: "False"
type: LoadBalancerReady
- lastTransitionTime: "2022-11-29T14:16:25Z"
reason: Deleted
severity: Info
status: "False"
type: NatGatewaysReady
- lastTransitionTime: "2022-11-29T13:45:00Z"
status: "True"
type: PrincipalCredentialRetrieved
- lastTransitionTime: "2022-11-29T13:45:00Z"
status: "True"
type: PrincipalUsageAllowed
- lastTransitionTime: "2022-11-29T14:16:26Z"
message: Route tables have been deleted
reason: Deleted
severity: Info
status: "False"
type: RouteTablesReady
- lastTransitionTime: "2022-11-29T14:16:24Z"
reason: Deleting
severity: Info
status: "False"
type: SecondaryCidrsReady
- lastTransitionTime: "2022-11-29T14:16:26Z"
message: Subnets are being deleted
reason: Deleting
severity: Info
status: "False"
type: SubnetsReady
- lastTransitionTime: "2022-11-29T14:13:47Z"
message: VPC endpoint has been deleted
reason: Deleted
severity: Info
status: "False"
type: VpcEndpointReady
- lastTransitionTime: "2022-11-29T14:16:25Z"
reason: Deleted
severity: Info
status: "False"
type: VpcReady
failureDomains:
eu-west-2a:
controlPlane: true
eu-west-2b:
controlPlane: true
eu-west-2c:
controlPlane: true
networkStatus:
apiServerElb:
attributes:
idleTimeout: 600000000000
availabilityZones:
- eu-west-2a
- eu-west-2b
- eu-west-2c
dnsName: internal-alextest21-apiserver-319032799.eu-west-2.elb.amazonaws.com
name: alextest21-apiserver
scheme: internal
securityGroupIds:
- sg-0cd56ef3aba2bf2c5
subnetIds:
- subnet-03d44fb2d6dab1563
- subnet-0878489d46a84cd0a
- subnet-0ac7dfc2dd31c7a76
tags:
Name: alextest21-apiserver
sigs.k8s.io/cluster-api-provider-aws/cluster/alextest21: owned
sigs.k8s.io/cluster-api-provider-aws/role: apiserver
securityGroups:
apiserver-lb:
id: sg-0cd56ef3aba2bf2c5
ingressRule:
- cidrBlocks:
- 0.0.0.0/0
description: Kubernetes API
fromPort: 6443
protocol: tcp
toPort: 6443
name: alextest21-apiserver-lb
tags:
Name: alextest21-apiserver-lb
sigs.k8s.io/cluster-api-provider-aws/cluster/alextest21: owned
sigs.k8s.io/cluster-api-provider-aws/role: apiserver-lb
controlplane:
id: sg-05fac931d1a5dc866
ingressRule:
- description: Kubernetes API
fromPort: 6443
protocol: tcp
sourceSecurityGroupIds:
- sg-05fac931d1a5dc866
- sg-0c7285fe03ebc6636
- sg-0cd56ef3aba2bf2c5
toPort: 6443
- description: allow AWS CNI traffic across nodes and control plane
fromPort: 0
protocol: "-1"
sourceSecurityGroupIds:
- sg-05fac931d1a5dc866
- sg-0c7285fe03ebc6636
toPort: 0
- description: etcd
fromPort: 2379
protocol: tcp
sourceSecurityGroupIds:
- sg-05fac931d1a5dc866
toPort: 2379
- description: etcd peer
fromPort: 2380
protocol: tcp
sourceSecurityGroupIds:
- sg-05fac931d1a5dc866
toPort: 2380
name: alextest21-controlplane
tags:
Name: alextest21-controlplane
sigs.k8s.io/cluster-api-provider-aws/cluster/alextest21: owned
sigs.k8s.io/cluster-api-provider-aws/role: controlplane
lb:
id: sg-03b4aa6a1fe6e1b88
name: alextest21-lb
tags:
Name: alextest21-lb
kubernetes.io/cluster/alextest21: owned
sigs.k8s.io/cluster-api-provider-aws/cluster/alextest21: owned
sigs.k8s.io/cluster-api-provider-aws/role: lb
node:
id: sg-0c7285fe03ebc6636
ingressRule:
- cidrBlocks:
- 0.0.0.0/0
description: Node Port Services
fromPort: 30000
protocol: tcp
toPort: 32767
- description: allow AWS CNI traffic across nodes and control plane
fromPort: 0
protocol: "-1"
sourceSecurityGroupIds:
- sg-05fac931d1a5dc866
- sg-0c7285fe03ebc6636
toPort: 0
- description: Kubelet API
fromPort: 10250
protocol: tcp
sourceSecurityGroupIds:
- sg-05fac931d1a5dc866
- sg-0c7285fe03ebc6636
toPort: 10250
name: alextest21-node
tags:
Name: alextest21-node
sigs.k8s.io/cluster-api-provider-aws/cluster/alextest21: owned
sigs.k8s.io/cluster-api-provider-aws/role: node
ready: true
aws-vpc-operator
logs:
2022-11-29T14:51:01.274Z INFO VPC endpoint is already deleted {"controller": "awscluster", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "AWSClust
er", "AWSCluster": {"name":"alextest21","namespace":"org-giantswarm"}, "namespace": "org-giantswarm", "name": "alextest21", "reconcileID": "0867c7fa-abc7-45c3-a707-09c7029c0c3a"}
2022-11-29T14:51:01.274Z INFO Deleting route tables {"controller": "awscluster", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "AWSCluster", "AW
SCluster": {"name":"alextest21","namespace":"org-giantswarm"}, "namespace": "org-giantswarm", "name": "alextest21", "reconcileID": "0867c7fa-abc7-45c3-a707-09c7029c0c3a"}
2022-11-29T14:51:01.274Z INFO Started reconciling route tables deletion {"controller": "awscluster", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controller
Kind": "AWSCluster", "AWSCluster": {"name":"alextest21","namespace":"org-giantswarm"}, "namespace": "org-giantswarm", "name": "alextest21", "reconcileID": "0867c7fa-abc7-45c3-a707-09
c7029c0c3a"}
2022-11-29T14:51:01.274Z INFO Started deleting all route tables {"controller": "awscluster", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "
AWSCluster", "AWSCluster": {"name":"alextest21","namespace":"org-giantswarm"}, "namespace": "org-giantswarm", "name": "alextest21", "reconcileID": "0867c7fa-abc7-45c3-a707-09c7029c0c
3a"}
2022-11-29T14:51:01.274Z INFO Started listing route tables {"controller": "awscluster", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "AWSClust
er", "AWSCluster": {"name":"alextest21","namespace":"org-giantswarm"}, "namespace": "org-giantswarm", "name": "alextest21", "reconcileID": "0867c7fa-abc7-45c3-a707-09c7029c0c3a"}
2022-11-29T14:51:01.342Z INFO Finished listing route tables {"controller": "awscluster", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "AWSClust
er", "AWSCluster": {"name":"alextest21","namespace":"org-giantswarm"}, "namespace": "org-giantswarm", "name": "alextest21", "reconcileID": "0867c7fa-abc7-45c3-a707-09c7029c0c3a", "co
unt": 0}
2022-11-29T14:51:01.342Z INFO Finished deleting all route tables {"controller": "awscluster", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "
AWSCluster", "AWSCluster": {"name":"alextest21","namespace":"org-giantswarm"}, "namespace": "org-giantswarm", "name": "alextest21", "reconcileID": "0867c7fa-abc7-45c3-a707-09c7029c0c
3a"}
2022-11-29T14:51:01.342Z INFO Finished reconciling route tables deletion {"controller": "awscluster", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controller
Kind": "AWSCluster", "AWSCluster": {"name":"alextest21","namespace":"org-giantswarm"}, "namespace": "org-giantswarm", "name": "alextest21", "reconcileID": "0867c7fa-abc7-45c3-a707-09
c7029c0c3a"}
2022-11-29T14:51:01.342Z INFO Deleted route tables {"controller": "awscluster", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "AWSCluster", "AW
SCluster": {"name":"alextest21","namespace":"org-giantswarm"}, "namespace": "org-giantswarm", "name": "alextest21", "reconcileID": "0867c7fa-abc7-45c3-a707-09c7029c0c3a"}
2022-11-29T14:51:01.342Z INFO Deleting subnets {"controller": "awscluster", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "AWSCluster", "AW
SCluster": {"name":"alextest21","namespace":"org-giantswarm"}, "namespace": "org-giantswarm", "name": "alextest21", "reconcileID": "0867c7fa-abc7-45c3-a707-09c7029c0c3a", "subnet-ids
": ["subnet-03d44fb2d6dab1563", "subnet-0878489d46a84cd0a", "subnet-0ac7dfc2dd31c7a76"]}
2022-11-29T14:51:01.342Z INFO Started reconciling subnets deletion {"controller": "awscluster", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "
AWSCluster", "AWSCluster": {"name":"alextest21","namespace":"org-giantswarm"}, "namespace": "org-giantswarm", "name": "alextest21", "reconcileID": "0867c7fa-abc7-45c3-a707-09c7029c0c
3a"}
2022-11-29T14:51:01.342Z INFO Started deleting subnets {"controller": "awscluster", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "AWSClust
er", "AWSCluster": {"name":"alextest21","namespace":"org-giantswarm"}, "namespace": "org-giantswarm", "name": "alextest21", "reconcileID": "0867c7fa-abc7-45c3-a707-09c7029c0c3a"}
2022-11-29T14:51:01.342Z INFO Deleting subnet {"controller": "awscluster", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "AWSCluster", "AWSCluster
": {"name":"alextest21","namespace":"org-giantswarm"}, "namespace": "org-giantswarm", "name": "alextest21", "reconcileID": "0867c7fa-abc7-45c3-a707-09c7029c0c3a", "subnet-id": "subne
t-03d44fb2d6dab1563"}
The operator wants to delete a subnet that doesn't exist anymore. I checked AWS and the VPC, subnets and route tables are gone.
Earlier today, I created and deleted alextest22
and it has the same behavior as alextest21
.
I created and deleted successfully 2 CAPA private clusters on golem
today.
Issue
The following resources are not deleted for private clusters: