As of Terraform 0.13, we can no longer access any Terraform variables during the destroy-time provisioner (see #55). We are currently using this to SSH into the management node to call a clean-up script to delete hanging nodes.
This PR changes the destroy provisioner to use the cloud CLI locally to delete hanging resources (via a script called cleanup.sh) instead of SSHing to the management node.
The potential issue with this is that we have to assume that the locally-running CLI's default configuration has the permission to delete the resources. There seems to be no way to pass from Terraform into the script, which credentials it is using to do its destruction.
On Google we can pull down the cluster-internal service account with gcloud iam service-accounts keys create, but even this depends on the default config being able to pull down that SA.
If the cleanup.sh script fails to destroy the resources, then the Terraform destroy will fail with a message. This means that the error should not go unnoticed, we will just have to make sure that we document how to solve the problem.
Pros:
Allows us to update to latest version of Terraform
We don't have to store a private key with admin permissions with the Terraform state
Cons:
The credentials for the cloud CLI (aws/gcloud/oci) must be set up correctly at destroy-time. Terraform can't guarantee this
Mitigations:
Using an (un)installer script would allow us to ensure that the credentials are set up correctly. All cloud providers not have web shells so we can expect a consistent interface.
I'm going to go ahead with this change. It's needed in order to use any modern version of Terraform. I think we can manage the clean up, either via the changes in this PR, or some other future method.
As of Terraform 0.13, we can no longer access any Terraform variables during the destroy-time provisioner (see #55). We are currently using this to SSH into the management node to call a clean-up script to delete hanging nodes.
This PR changes the destroy provisioner to use the cloud CLI locally to delete hanging resources (via a script called
cleanup.sh
) instead of SSHing to the management node.The potential issue with this is that we have to assume that the locally-running CLI's default configuration has the permission to delete the resources. There seems to be no way to pass from Terraform into the script, which credentials it is using to do its destruction.
On Google we can pull down the cluster-internal service account with
gcloud iam service-accounts keys create
, but even this depends on the default config being able to pull down that SA.If the
cleanup.sh
script fails to destroy the resources, then the Terraform destroy will fail with a message. This means that the error should not go unnoticed, we will just have to make sure that we document how to solve the problem.Pros:
Cons:
aws
/gcloud
/oci
) must be set up correctly at destroy-time. Terraform can't guarantee thisMitigations:
Does any one have any thoughts on this solution?