Open DinaBelova opened 1 month ago
An intermittent issue which most probably connected to CAPI provider.
At some point Cluster as well as Machines stuck in Deleting state, even though the actual infrastructure in AWS was cleared.
Deleting
@Kshatrix noticed that when it happens AWSCluster object is absent, even though Machines and AWSMachines are present.
AWSCluster
Machines
AWSMachines
AWS provider tries to patch AWSCluster and then marks it as Not ready
Not ready
I0726 16:19:31.225463 1 awscluster_controller.go:208] "Reconciling AWSCluster delete" controller="awscluster" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSCluster" AWSCluster="defau lt/aws-cl-1" namespace="default" name="aws-cl-1" reconcileID="da967d9f-4c3d-47a1-953a-cedf44e4d8d0" cluster="default/aws-cl-1" I0726 16:19:33.955431 1 securitygroups.go:320] "Deleted security group" controller="awscluster" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSCluster" AWSCluster="default/aws-cl-1" n amespace="default" name="aws-cl-1" reconcileID="da967d9f-4c3d-47a1-953a-cedf44e4d8d0" cluster="default/aws-cl-1" security-group-id="sg-068b633aae83d2e19" kind="cluster managed" I0726 16:19:34.432437 1 securitygroups.go:320] "Deleted security group" controller="awscluster" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSCluster" AWSCluster="default/aws-cl-1" n amespace="default" name="aws-cl-1" reconcileID="da967d9f-4c3d-47a1-953a-cedf44e4d8d0" cluster="default/aws-cl-1" security-group-id="sg-05fe37ab8f0a3ab15" kind="cluster managed" I0726 16:19:36.516438 1 vpc.go:550] "Deleted VPC" controller="awscluster" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSCluster" AWSCluster="default/aws-cl-1" namespace="default" nam e="aws-cl-1" reconcileID="da967d9f-4c3d-47a1-953a-cedf44e4d8d0" cluster="default/aws-cl-1" vpc-id="vpc-03b7241ad6eae9ab1" E0726 16:19:36.632931 1 controller.go:329] "Reconciler error" err="failed to patch AWSCluster default/aws-cl-1: awsclusters.infrastructure.cluster.x-k8s.io \"aws-cl-1\" not found" controller="awscluster" c ontrollerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSCluster" AWSCluster="default/aws-cl-1" namespace="default" name="aws-cl-1" reconcileID="da967d9f-4c3d-47a1-953a-cedf44e4d8d0" I0726 16:19:51.603067 1 awsmachine_controller.go:198] "AWSCluster or AWSManagedControlPlane is not ready yet" controller="awsmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSMa chine" AWSMachine="default/aws-cl-1-md-xczh6-bfjvv" namespace="default" name="aws-cl-1-md-xczh6-bfjvv" reconcileID="c0acf9c4-8be9-413f-a906-483b59563d9f" machine="default/aws-cl-1-md-xczh6-bfjvv" cluster="defaul t/aws-cl-1" I0726 16:19:52.434829 1 awsmachine_controller.go:198] "AWSCluster or AWSManagedControlPlane is not ready yet" controller="awsmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSMa chine" AWSMachine="default/aws-cl-1-cp-0" namespace="default" name="aws-cl-1-cp-0" reconcileID="dac57c28-e165-472b-b20f-fa0521e4b2f1" machine="default/aws-cl-1-cp-0" cluster="default/aws-cl-1" I0726 16:19:59.970099 1 awsmachine_controller.go:198] "AWSCluster or AWSManagedControlPlane is not ready yet" controller="awsmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSMa chine" AWSMachine="default/aws-cl-1-cp-0" namespace="default" name="aws-cl-1-cp-0" reconcileID="df4dc07d-5e1b-4d28-88a3-0f30fa7a76f8" machine="default/aws-cl-1-cp-0" cluster="default/aws-cl-1" I0726 16:19:59.970270 1 awsmachine_controller.go:198] "AWSCluster or AWSManagedControlPlane is not ready yet" controller="awsmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSMa chine" AWSMachine="default/aws-cl-1-md-xczh6-bfjvv" namespace="default" name="aws-cl-1-md-xczh6-bfjvv" reconcileID="120b3a56-1855-4d44-8333-8502e8d04981" machine="default/aws-cl-1-md-xczh6-bfjvv" cluster="defaul t/aws-cl-1" I0726 16:22:36.109923 1 awsmachine_controller.go:198] "AWSCluster or AWSManagedControlPlane is not ready yet" controller="awsmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSMa chine" AWSMachine="default/aws-cl-1-cp-0" namespace="default" name="aws-cl-1-cp-0" reconcileID="4c0f6242-2f77-4194-b9dc-c1aed2034184" machine="default/aws-cl-1-cp-0" cluster="default/aws-cl-1" I0726 16:22:36.110149 1 awsmachine_controller.go:198] "AWSCluster or AWSManagedControlPlane is not ready yet" controller="awsmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSMa chine" AWSMachine="default/aws-cl-1-md-xczh6-bfjvv" namespace="default" name="aws-cl-1-md-xczh6-bfjvv" reconcileID="90ca119f-d6c2-457f-9d79-be35c1ae70a8" machine="default/aws-cl-1-md-xczh6-bfjvv" cluster="defaul t/aws-cl-1" I0726 16:29:30.719506 1 awsmachine_controller.go:198] "AWSCluster or AWSManagedControlPlane is not ready yet" controller="awsmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSMa chine" AWSMachine="default/aws-cl-1-cp-0" namespace="default" name="aws-cl-1-cp-0" reconcileID="d9d8a14a-8e04-43ab-b596-b3bc9083af81" machine="default/aws-cl-1-cp-0" cluster="default/aws-cl-1" I0726 16:29:30.719543 1 awsmachine_controller.go:198] "AWSCluster or AWSManagedControlPlane is not ready yet" controller="awsmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSMa chine" AWSMachine="default/aws-cl-1-md-xczh6-bfjvv" namespace="default" name="aws-cl-1-md-xczh6-bfjvv" reconcileID="7bba88c3-0555-4536-bb12-d366df08e338" machine="default/aws-cl-1-md-xczh6-bfjvv" cluster="defaul t/aws-cl-1" I0726 16:32:52.252957 1 awsmachine_controller.go:198] "AWSCluster or AWSManagedControlPlane is not ready yet" controller="awsmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSMa chine" AWSMachine="default/aws-cl-1-md-xczh6-bfjvv" namespace="default" name="aws-cl-1-md-xczh6-bfjvv" reconcileID="2ba19b22-ddb1-4323-aa07-4bd170f5e49b" machine="default/aws-cl-1-md-xczh6-bfjvv" cluster="defaul t/aws-cl-1" I0726 16:32:52.253183 1 awsmachine_controller.go:198] "AWSCluster or AWSManagedControlPlane is not ready yet" controller="awsmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSMa chine" AWSMachine="default/aws-cl-1-cp-0" namespace="default" name="aws-cl-1-cp-0" reconcileID="36079ee5-a4ec-407a-a181-1d7a3ca1058f" machine="default/aws-cl-1-cp-0" cluster="default/aws-cl-1"
After that process is pretty much stuck.
We should keep this issue in mind.
Restart of controller not helping.
Created upstream issue kubernetes-sigs/cluster-api-provider-aws#5107 @Kshatrix FYI
Our fix for #217 should provide a workaround for that (but it is a temporary solution).
An intermittent issue which most probably connected to CAPI provider.
At some point Cluster as well as Machines stuck in
Deleting
state, even though the actual infrastructure in AWS was cleared.@Kshatrix noticed that when it happens
AWSCluster
object is absent, even thoughMachines
andAWSMachines
are present.AWS provider tries to patch
AWSCluster
and then marks it asNot ready
After that process is pretty much stuck.
We should keep this issue in mind.
Restart of controller not helping.