gavinbunney / terraform-provider-kubectl

Terraform provider to handle raw kubernetes manifest yaml files
https://registry.terraform.io/providers/gavinbunney/kubectl
Mozilla Public License 2.0
609 stars 102 forks source link

Terraform state is updated when update apply fails #265

Open RoryCrispin opened 1 year ago

RoryCrispin commented 1 year ago

When the provider fails while applying an update to a kubectl resource the change is still persisted in the terraform state. Subsequent plans will then generate no changes and the inconsistency remains silently present.

To mitigate, the administrator will need to manually identify all cases where the state has become out-of-sync and trigger a change, such as making a superflous change to the YAML definition such adding a tmp annotation in order to force the provider to update the resource.

Steps to reproduce:

Using Rancher Desktop as an example

With the definition;

terraform {
  required_version = ">= 0.13"

  required_providers {
    kubectl = {
      source  = "gavinbunney/kubectl"
      version = ">= 1.7.0"
    }
  }
}

provider "kubectl" {
  host                   = "127.0.0.1:6443"
  load_config_file       = true
  config_context = "rancher-desktop"
  insecure = true
}

resource "kubectl_manifest" "test" {
    yaml_body = <<YAML
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  annotations:
    tmp: one
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 1
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.2
        ports:
        - containerPort: 80
YAML
}
  1. Run terraform init, then run and apply terraform apply to create the resource
  2. Update the spec.replicas: 2 and the annotation to tmp: two
  3. Runterraform apply to generate the plan but don't yet type 'yes' to run it
  4. Simulate a network partition somehow. As I'm using a local cluster I will just shut it down
  5. Type 'yes' to apply the plan
  6. Observe that the apply fails - and that terraform state pull shows that replicas: 2 and tmp: two was persisted to TF state
  7. Resolve the network partition
  8. Observe that the replica count and annotation is still 1/one on the cluster
  9. Run terraform plan and observe that 'no changes' are requested by the provider

Workaround: As a workaround - apply another change to yaml and apply it, then reverse it.

  1. Change the replicas and annotation to 3/three
  2. Run and apply terraform apply
  3. Observe that the plan shows a diff from 2->3 rather than 1->3 which is what will actually be applied to the cluster
  4. Change them back to 2 and apply - you are now at the desired state.
    # kubectl_manifest.test will be updated in-place
    ~ resource "kubectl_manifest" "test" {
        id                      = "/apis/apps/v1/namespaces/default/deployments/nginx-deployment"
        name                    = "nginx-deployment"
      ~ yaml_body               = (sensitive value)
      ~ yaml_body_parsed        = <<-EOT
            apiVersion: apps/v1
            kind: Deployment
            metadata:
              annotations:
          -     tmp: two
          +     tmp: three
              name: nginx-deployment
            spec:
          -   replicas: 2
          +   replicas: 3
              selector:
                matchLabels:
                  app: nginx
              template:
                metadata:
                  labels:
                    app: nginx
                spec:
                  containers:
                  - image: nginx:1.14.2
                    name: nginx
                    ports:
                    - containerPort: 80
        EOT
        # (13 unchanged attributes hidden)
    }

Shutting down the cluster is, of course, a contrived example. This bug was actually found on a real cluster because the CI workers K8s credentials expired due to a long task before the TF apply, causing an Unauthorized response from K8s.

I suspect this bug is related to the following line of documentation: https://developer.hashicorp.com/terraform/plugin/framework/diagnostics#how-errors-affect-state

How Errors Affect State Returning an error diagnostic does not stop the state from being updated. Terraform will still persist the returned state even when an error diagnostic is returned with it. This is to allow Terraform to persist the values that have already been modified when a resource modification requires multiple API requests or an API request fails after an earlier one succeeded.

When returning error diagnostics, we recommend resetting the state in the response to the prior state available in the configuration.