gavinbunney / terraform-provider-kubectl

Terraform provider to handle raw kubernetes manifest yaml files
https://registry.terraform.io/providers/gavinbunney/kubectl
Mozilla Public License 2.0
609 stars 102 forks source link

kubernetes resource (/apis/...) not found, removing from state #295

Open sivanov-nuodb opened 2 months ago

sivanov-nuodb commented 2 months ago

Environment

EKS version: v1.29 gavinbunney kubectl version: 1.14.0 terraform: v1.5.7

Actual behaviour

Sometimes kubectl_manifest TF resources are removed from TF state incorrectly which prevents them from being destroyed by TF. This becomes even more problematic in case a custom destroy TF provisioner is attached to the resource which never fires.

Please consider the log snippet below which captures part of the terraform destroy (specifically when the resource state is refreshed):

2024-06-13T14:59:09.982+0300 [TRACE] Completed graph transform *terraform.RootTransformer with new graph:
  kubectl_manifest.karpenter_provisioner[0] - *terraform.NodePlannableResourceInstance
  root - terraform.graphNodeRoot
    kubectl_manifest.karpenter_provisioner[0] - *terraform.NodePlannableResourceInstance
  ------
2024-06-13T14:59:09.982+0300 [TRACE] vertex "kubectl_manifest.karpenter_provisioner (expand)": entering dynamic subgraph
2024-06-13T14:59:09.982+0300 [TRACE] vertex "kubectl_manifest.karpenter_provisioner[0]": starting visit (*terraform.NodePlannableResourceInstance)
2024-06-13T14:59:09.983+0300 [DEBUG] provider.stdio: received EOF, stopping recv loop: err="rpc error: code = Unavailable desc = error reading from server: EOF"
2024-06-13T14:59:09.985+0300 [DEBUG] provider: plugin process exited: path=.terraform/providers/registry.terraform.io/hashicorp/helm/2.10.1/darwin_amd64/terraform-provider-helm_v2.10.1_x5 pid=68043
2024-06-13T14:59:09.986+0300 [DEBUG] provider: plugin exited
2024-06-13T14:59:09.986+0300 [TRACE] vertex "provider[\"registry.terraform.io/hashicorp/helm\"] (close)": visit complete
2024-06-13T14:59:09.986+0300 [TRACE] readResourceInstanceState: reading state for kubectl_manifest.karpenter_node_group_template[0]
2024-06-13T14:59:09.986+0300 [TRACE] readResourceInstanceState: reading state for kubectl_manifest.karpenter_provisioner[0]
2024-06-13T14:59:09.986+0300 [TRACE] upgradeResourceState: schema version of kubectl_manifest.karpenter_provisioner[0] is still 1; calling provider "kubectl" for any other minor fixups
2024-06-13T14:59:09.986+0300 [TRACE] GRPCProvider: UpgradeResourceState
2024-06-13T14:59:09.986+0300 [TRACE] upgradeResourceState: schema version of kubectl_manifest.karpenter_node_group_template[0] is still 1; calling provider "kubectl" for any other minor fixups
2024-06-13T14:59:09.986+0300 [TRACE] GRPCProvider: UpgradeResourceState
2024-06-13T14:59:09.987+0300 [TRACE] NodeAbstractResouceInstance.writeResourceInstanceState to prevRunState for kubectl_manifest.karpenter_provisioner[0]
2024-06-13T14:59:09.987+0300 [TRACE] NodeAbstractResouceInstance.writeResourceInstanceState: writing state object for kubectl_manifest.karpenter_provisioner[0]
2024-06-13T14:59:09.987+0300 [TRACE] NodeAbstractResouceInstance.writeResourceInstanceState to prevRunState for kubectl_manifest.karpenter_node_group_template[0]
2024-06-13T14:59:09.987+0300 [TRACE] NodeAbstractResouceInstance.writeResourceInstanceState: writing state object for kubectl_manifest.karpenter_node_group_template[0]
2024-06-13T14:59:09.987+0300 [TRACE] NodeAbstractResouceInstance.writeResourceInstanceState to refreshState for kubectl_manifest.karpenter_provisioner[0]
2024-06-13T14:59:09.987+0300 [TRACE] NodeAbstractResouceInstance.writeResourceInstanceState: writing state object for kubectl_manifest.karpenter_provisioner[0]
2024-06-13T14:59:09.987+0300 [TRACE] NodeAbstractResouceInstance.writeResourceInstanceState to refreshState for kubectl_manifest.karpenter_node_group_template[0]
2024-06-13T14:59:09.987+0300 [TRACE] NodeAbstractResouceInstance.writeResourceInstanceState: writing state object for kubectl_manifest.karpenter_node_group_template[0]
kubectl_manifest.karpenter_provisioner[0]: Refreshing state... [id=/apis/karpenter.sh/v1alpha5/provisioners/default]
kubectl_manifest.karpenter_node_group_template[0]: Refreshing state... [id=/apis/karpenter.k8s.aws/v1alpha1/awsnodetemplates/default]
2024-06-13T14:59:09.987+0300 [TRACE] NodeAbstractResourceInstance.refresh for kubectl_manifest.karpenter_provisioner[0]
2024-06-13T14:59:09.987+0300 [TRACE] NodeAbstractResourceInstance.refresh for kubectl_manifest.karpenter_node_group_template[0]
2024-06-13T14:59:09.987+0300 [TRACE] GRPCProvider: ReadResource
2024-06-13T14:59:09.987+0300 [TRACE] GRPCProvider: ReadResource
2024-06-13T14:59:09.988+0300 [DEBUG] provider.terraform-provider-kubectl_v1.14.0: 2024/06/13 14:59:09 [DEBUG] default Unstructed YAML: map[apiVersion:karpenter.k8s.aws/v1alpha1 kind:AWSNodeTemplate metadata:map[name:default] spec:map[instanceProfile:AmazonEKSTFKarpenterNodeRole-eks-dbaas-sivanovdestroytest securityGroupSelector:map[karpenter.sh/discovery:eks-dbaas-sivanovdestroytest] subnetSelector:map[karpenter.sh/discovery:eks-dbaas-sivanovdestroytest] tags:map[dbaas.nuodb.com/cluster:eks-dbaas-sivanovdestroytest]]]
2024-06-13T14:59:09.988+0300 [DEBUG] provider.terraform-provider-kubectl_v1.14.0: 2024/06/13 14:59:09 [DEBUG] default Unstructed YAML: map[apiVersion:karpenter.sh/v1alpha5 kind:Provisioner metadata:map[name:default] spec:map[consolidation:map[enabled:true] limits:map[resources:map[cpu:16 memory:32Gi]] providerRef:map[name:default] requirements:[map[key:node.kubernetes.io/instance-type operator:In values:[t3.medium t4g.medium t3a.medium]] map[key:karpenter.sh/capacity-type operator:In values:[spot]]] ttlSecondsUntilExpired:604800]]
2024-06-13T14:59:09.992+0300 [DEBUG] provider.terraform-provider-kubectl_v1.14.0: 2024/06/13 14:59:09 [DEBUG] default fetch from kubernetes
2024-06-13T14:59:09.994+0300 [DEBUG] provider.terraform-provider-kubectl_v1.14.0: 2024/06/13 14:59:09 [DEBUG] default fetch from kubernetes
2024-06-13T14:59:10.525+0300 [DEBUG] provider.terraform-provider-kubectl_v1.14.0: 2024/06/13 14:59:10 [WARN] kubernetes resource (/apis/karpenter.k8s.aws/v1alpha1/awsnodetemplates/default) not found, removing from state
2024-06-13T14:59:10.525+0300 [DEBUG] provider.terraform-provider-kubectl_v1.14.0: 2024/06/13 14:59:10 [WARN] kubernetes resource (/apis/karpenter.sh/v1alpha5/provisioners/default) not found, removing from state
2024-06-13T14:59:10.525+0300 [WARN]  Provider "registry.terraform.io/gavinbunney/kubectl" produced an unexpected new value for kubectl_manifest.karpenter_node_group_template[0] during refresh.
      - Root resource was present, but now absent
2024-06-13T14:59:10.525+0300 [WARN]  Provider "registry.terraform.io/gavinbunney/kubectl" produced an unexpected new value for kubectl_manifest.karpenter_provisioner[0] during refresh.
      - Root resource was present, but now absent

Notice that TF resources kubectl_manifest.karpenter_provisioner[0] and kubectl_manifest.karpenter_node_group_template[0] are reported as not found and removed from state. They are never selected for destruction by TF during the destroy phase which leaves them behind.

Expected behavior

The Kubernetes resource should not be left behind.

Troubleshooting

After doing terraform destroy, the Kubernetes resource is left behind.

$ kubectl get provisioners.karpenter.sh
NAME      AGE
default   87m

The relevant code here treats 404 (Not Found) and 410 (Gone) errors equally as resource not found. I was expecting that this might be problematic and looked in the K8s audit logs.

In the K8s API server Audit Trail in CloudWatch, you can see that there are no requests for this resource around 11:59:10 (the terraform client is running in GMT+3 timezone) which makes me think that this is some kind of client caching problem or incorrect URI is used.

Screenshot 2024-06-13 at 15 06 46

The resource was created at 10:46:46.530:

Screenshot 2024-06-13 at 15 07 27

This is the only NOK response (404) before the resource has been created.

Screenshot 2024-06-13 at 15 07 13

This doesn't seem to be specific to cluster-scoped resources since I can find examples for other TF resources with the same behaviour.

$ grep "not found, removing from state" destroy.log
2024-06-13T14:55:26.408+0300 [DEBUG] provider.terraform-provider-kubectl_v1.14.0: 2024/06/13 14:55:26 [WARN] kubernetes resource (/apis/monitoring.grafana.com/v1alpha1/namespaces/kube-system/podlogses/kube) not found, removing from state
2024-06-13T14:55:26.409+0300 [DEBUG] provider.terraform-provider-kubectl_v1.14.0: 2024/06/13 14:55:26 [WARN] kubernetes resource (/apis/monitoring.grafana.com/v1alpha1/namespaces/loki/grafanaagents/agent) not found, removing from state
2024-06-13T14:55:26.508+0300 [DEBUG] provider.terraform-provider-kubectl_v1.14.0: 2024/06/13 14:55:26 [WARN] kubernetes resource (/apis/monitoring.grafana.com/v1alpha1/namespaces/loki/integrations/kube-events) not found, removing from state
2024-06-13T14:55:26.885+0300 [DEBUG] provider.terraform-provider-kubectl_v1.14.0: 2024/06/13 14:55:26 [WARN] kubernetes resource (/apis/monitoring.grafana.com/v1alpha1/namespaces/prometheus/podlogses/logs-prometheus) not found, removing from state
2024-06-13T14:55:26.885+0300 [DEBUG] provider.terraform-provider-kubectl_v1.14.0: 2024/06/13 14:55:26 [WARN] kubernetes resource (/apis/monitoring.grafana.com/v1alpha1/namespaces/platform-system/podlogses/logs-platform-system) not found, removing from state
2024-06-13T14:55:27.026+0300 [DEBUG] provider.terraform-provider-kubectl_v1.14.0: 2024/06/13 14:55:27 [WARN] kubernetes resource (/apis/monitoring.grafana.com/v1alpha1/namespaces/loki/logsinstances/loki) not found, removing from state
2024-06-13T14:55:27.026+0300 [DEBUG] provider.terraform-provider-kubectl_v1.14.0: 2024/06/13 14:55:27 [WARN] kubernetes resource (/apis/monitoring.grafana.com/v1alpha1/namespaces/karpenter/podlogses/logs-karpenter) not found, removing from state
2024-06-13T14:55:27.028+0300 [DEBUG] provider.terraform-provider-kubectl_v1.14.0: 2024/06/13 14:55:27 [WARN] kubernetes resource (/apis/monitoring.grafana.com/v1alpha1/namespaces/loki/podlogses/loki) not found, removing from state
2024-06-13T14:55:27.678+0300 [DEBUG] provider.terraform-provider-kubectl_v1.14.0: 2024/06/13 14:55:27 [WARN] kubernetes resource (/apis/snapshot.storage.k8s.io/v1/volumesnapshotclasses/snap-ebs-delete) not found, removing from state
2024-06-13T14:55:30.121+0300 [DEBUG] provider.terraform-provider-kubectl_v1.14.0: 2024/06/13 14:55:30 [WARN] kubernetes resource (/apis/cert-manager.io/v1/clusterissuers/letsencrypt) not found, removing from state
2024-06-13T14:55:30.313+0300 [DEBUG] provider.terraform-provider-kubectl_v1.14.0: 2024/06/13 14:55:30 [WARN] kubernetes resource (/apis/cert-manager.io/v1/namespaces/platform-system/certificates/haproxy-tls-cert) not found, removing from state
2024-06-13T14:59:10.525+0300 [DEBUG] provider.terraform-provider-kubectl_v1.14.0: 2024/06/13 14:59:10 [WARN] kubernetes resource (/apis/karpenter.k8s.aws/v1alpha1/awsnodetemplates/default) not found, removing from state
2024-06-13T14:59:10.525+0300 [DEBUG] provider.terraform-provider-kubectl_v1.14.0: 2024/06/13 14:59:10 [WARN] kubernetes resource (/apis/karpenter.sh/v1alpha5/provisioners/default) not found, removing from state

Mentioned in https://github.com/gavinbunney/terraform-provider-kubectl/issues/270#issuecomment-1622040258