When the provider fails while applying an update to a kubectl resource the change is still persisted in the terraform state.
Subsequent plans will then generate no changes and the inconsistency remains silently present.
To mitigate, the administrator will need to manually identify all cases where the state has become out-of-sync and trigger a change, such as making a superflous change to the YAML definition such adding a tmp annotation in order to force the provider to update the resource.
Shutting down the cluster is, of course, a contrived example.
This bug was actually found on a real cluster because the CI workers K8s credentials expired due to a long task before the TF apply, causing an Unauthorized response from K8s.
How Errors Affect State
Returning an error diagnostic does not stop the state from being updated. Terraform will still persist the returned state even when an error diagnostic is returned with it. This is to allow Terraform to persist the values that have already been modified when a resource modification requires multiple API requests or an API request fails after an earlier one succeeded.
When returning error diagnostics, we recommend resetting the state in the response to the prior state available in the configuration.
When the provider fails while applying an update to a kubectl resource the change is still persisted in the terraform state. Subsequent plans will then generate
no changes
and the inconsistency remains silently present.To mitigate, the administrator will need to manually identify all cases where the state has become out-of-sync and trigger a change, such as making a superflous change to the YAML definition such adding a tmp annotation in order to force the provider to update the resource.
Steps to reproduce:
Using Rancher Desktop as an example
With the definition;
terraform init
, then run and applyterraform apply
to create the resourcespec.replicas: 2
and the annotation totmp: two
terraform apply
to generate the plan but don't yet type 'yes' to run itterraform state pull
shows thatreplicas: 2
andtmp: two
was persisted to TF stateterraform plan
and observe that 'no changes' are requested by the providerWorkaround: As a workaround - apply another change to yaml and apply it, then reverse it.
terraform apply
Shutting down the cluster is, of course, a contrived example. This bug was actually found on a real cluster because the CI workers K8s credentials expired due to a long task before the TF apply, causing an Unauthorized response from K8s.
I suspect this bug is related to the following line of documentation: https://developer.hashicorp.com/terraform/plugin/framework/diagnostics#how-errors-affect-state