Terraform state is updated when update apply fails

When the provider fails while applying an update to a kubectl resource the change is still persisted in the terraform state. Subsequent plans will then generate no changes and the inconsistency remains silently present.

To mitigate, the administrator will need to manually identify all cases where the state has become out-of-sync and trigger a change, such as making a superflous change to the YAML definition such adding a tmp annotation in order to force the provider to update the resource.

Steps to reproduce:

Using Rancher Desktop as an example

With the definition;

terraform {
  required_version = ">= 0.13"

  required_providers {
    kubectl = {
      source  = "gavinbunney/kubectl"
      version = ">= 1.7.0"
    }
  }
}

provider "kubectl" {
  host                   = "127.0.0.1:6443"
  load_config_file       = true
  config_context = "rancher-desktop"
  insecure = true
}

resource "kubectl_manifest" "test" {
    yaml_body = <<YAML
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  annotations:
    tmp: one
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 1
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.2
        ports:
        - containerPort: 80
YAML
}

Run terraform init, then run and apply terraform apply to create the resource
Update the spec.replicas: 2 and the annotation to tmp: two
Runterraform apply to generate the plan but don't yet type 'yes' to run it
Simulate a network partition somehow. As I'm using a local cluster I will just shut it down
Type 'yes' to apply the plan
Observe that the apply fails - and that terraform state pull shows that replicas: 2 and tmp: two was persisted to TF state
Resolve the network partition
Observe that the replica count and annotation is still 1/one on the cluster
Run terraform plan and observe that 'no changes' are requested by the provider

Workaround: As a workaround - apply another change to yaml and apply it, then reverse it.

Change the replicas and annotation to 3/three
Run and apply terraform apply
Observe that the plan shows a diff from 2->3 rather than 1->3 which is what will actually be applied to the cluster

Change them back to 2 and apply - you are now at the desired state.

# kubectl_manifest.test will be updated in-place
~ resource "kubectl_manifest" "test" {
    id                      = "/apis/apps/v1/namespaces/default/deployments/nginx-deployment"
    name                    = "nginx-deployment"
  ~ yaml_body               = (sensitive value)
  ~ yaml_body_parsed        = <<-EOT
        apiVersion: apps/v1
        kind: Deployment
        metadata:
          annotations:
      -     tmp: two
      +     tmp: three
          name: nginx-deployment
        spec:
      -   replicas: 2
      +   replicas: 3
          selector:
            matchLabels:
              app: nginx
          template:
            metadata:
              labels:
                app: nginx
            spec:
              containers:
              - image: nginx:1.14.2
                name: nginx
                ports:
                - containerPort: 80
    EOT
    # (13 unchanged attributes hidden)
}

Shutting down the cluster is, of course, a contrived example. This bug was actually found on a real cluster because the CI workers K8s credentials expired due to a long task before the TF apply, causing an Unauthorized response from K8s.

I suspect this bug is related to the following line of documentation: https://developer.hashicorp.com/terraform/plugin/framework/diagnostics#how-errors-affect-state

How Errors Affect State Returning an error diagnostic does not stop the state from being updated. Terraform will still persist the returned state even when an error diagnostic is returned with it. This is to allow Terraform to persist the values that have already been modified when a resource modification requires multiple API requests or an API request fails after an earlier one succeeded.

When returning error diagnostics, we recommend resetting the state in the response to the prior state available in the configuration.

gavinbunney / terraform-provider-kubectl

Terraform state is updated when update apply fails #265

Steps to reproduce: