GalleyBytes / terraform-operator

A Kubernetes CRD to handle terraform operations
http://tf.galleybytes.com
Apache License 2.0
364 stars 47 forks source link

deleting a Terraform should delete any pending apply jobs first #22

Closed jstrachan closed 3 years ago

jstrachan commented 3 years ago

we're using the most excellent terraform-operator on the Jenkins X project as a way to run multiple system tests on each Pull Request build. Its working really well so far. We'd love to extend that further and use the terraform-operator inside Jenkins X's preview environments so that its super easy to get automated preview environments via Terraform (e.g. spin up preview k8s clusters and associated cloud infrastructure for each PR).

However we've hit a minor issue.

We're using the postrunScript mechanism like this example: https://github.com/jstrachan/jxr-versions/blob/tf-operator/.lighthouse/jenkins-x/bdd/terraform.yaml#L54 to connect to the BDD test kubernetes cluster and run tests inside that cluster and report back the success/failure.

The problem we have is if we trigger another BDD test of the same Pull Request (e.g. via a new commit or triggering a retest) we garbage collect old tests (via the jx test plugin which the performs essentially the following:

kubectl delete terraform $the-terraform-for-the-old-build

this generally works OK apart from if the apply job is still running (e.g. the terraform apply completed and now the test is running in the postrunScript.

The problem is the terraform destroy then runs, destroys the cluster, usually makes the postrunScript fail and re-run the pod in the Job which then brings back the cluster we've just removed ;)

To work around this it would be awesome if we could add a flag to eagerly remove any pending apply Job before running the destroy Job. Then we don't have old cloud infrastructure coming back again which then needs a manual garbage collect.

I'm happy to submit a Pull Request - I just wanted to double check this all sounds fine to you.

I'm thinking of adding a deleteApplyJobOnDelete flag or something like that - which if enabled would remove the apply job if its not completed before running the delete Job

isaaguilar commented 3 years ago

I'm humbled and flattered jenkins-x is using this project. This use case here makes a lot of sense. I'll take a look at the implementation soon.

jstrachan commented 3 years ago

our BDD tests got switched over to the operator yesterday and so far things are working great ;)

the terraform-*.yaml files are the main Terraform resources used in testing: https://github.com/jenkins-x/jx3-versions/tree/master/.lighthouse/jenkins-x/bdd

then the script actually triggers the terraform operator: https://github.com/jenkins-x/jx3-versions/blob/master/.lighthouse/jenkins-x/bdd/terraform-ci.sh#L200-L210

typically Jenkins X BDD tests involve creating 2 git repositories; one for terraform/infra and one for helm stuff (used by the git operator installed via a helm chart via terraform) - though the nice thing about the terraform operator + the Terraform resource is its just declarative configuration whether a new repo is created per BDD test / PR or whether we just reuse the same terraform repo. e.g. the kube bdd test just uses a simple GKE based terraform repository for terraform

jstrachan commented 3 years ago

btw we're using this little CLI tool jx test to create the Terraform resource and tail the log of the operator job to then report BDD tests in pipelines nicely so developers get feedback + logs of the terraform + the actual test running and then pass/fail the pipeline etc.