Move cleanup script to use local CLI

As of Terraform 0.13, we can no longer access any Terraform variables during the destroy-time provisioner (see #55). We are currently using this to SSH into the management node to call a clean-up script to delete hanging nodes.

This PR changes the destroy provisioner to use the cloud CLI locally to delete hanging resources (via a script called cleanup.sh) instead of SSHing to the management node.

The potential issue with this is that we have to assume that the locally-running CLI's default configuration has the permission to delete the resources. There seems to be no way to pass from Terraform into the script, which credentials it is using to do its destruction.

On Google we can pull down the cluster-internal service account with gcloud iam service-accounts keys create, but even this depends on the default config being able to pull down that SA.

If the cleanup.sh script fails to destroy the resources, then the Terraform destroy will fail with a message. This means that the error should not go unnoticed, we will just have to make sure that we document how to solve the problem.

Pros:

Allows us to update to latest version of Terraform
We don't have to store a private key with admin permissions with the Terraform state

Cons:

The credentials for the cloud CLI (aws/gcloud/oci) must be set up correctly at destroy-time. Terraform can't guarantee this

Mitigations:

Using an (un)installer script would allow us to ensure that the credentials are set up correctly. All cloud providers not have web shells so we can expect a consistent interface.

Does any one have any thoughts on this solution?

clusterinthecloud / terraform

Move cleanup script to use local CLI #64