aztfmod / rover

The rover is a docker container in charge of the deployment of the Terraform platform engineering for Azure
MIT License
172 stars 142 forks source link

Rover doesn't upload tfstate file when terraform apply fails #192

Closed bobalong79 closed 2 years ago

bobalong79 commented 3 years ago

Hi guys

We are fairly new to using the rover and are considering moving to production quite soon.

We've encountered a few situations where the terraform apply fails and the rover process just exits (on line 49 below) without calling the upload_tfstate function. https://github.com/aztfmod/rover/blob/19fbcb92a72dff18ad098ae7ac1825433c62699f/scripts/tfstate_azurerm.sh#L49-L51

The apply function calls the error function, which then exits the process (in our case with code 2001) https://github.com/aztfmod/rover/blob/19fbcb92a72dff18ad098ae7ac1825433c62699f/scripts/tfstate_azurerm.sh#L374

This can leave some azure resources created but then not reflected in the remote tfstate file, meaning if we then try to destroy the half-created landingzone it doesn't work.

Please can you advise if it's possible to configure rover to update the remote tfstate as the apply is happening (after each change) so that in the case where the apply fails we at least have consistency between the remote state and azure resources that are created, even if the whole apply can't be completed? This would allow us to recover from such failures by using automation rather than manually intervening in the azure portal.

Thank you kindly

LaurentLesle commented 2 years ago

Hi @bobalong79 we have a mechanism in rover that is supposed to recover the bootstrap process of the launchpad when an error occurs during the apply. When running the rover it should detect the previous execution from the local cached rover and re-execute the apply, which will end-up uploading the tfstate into the storage account. For that reason we recommend you deploy the launchpad from the vscode as this recovery mechanism is only available from running rover in vscode and not guaranteed in the pipelines.

Once the launchpad has been setup you should not experiment this issue anymore as rover will configure terraform to use remote tfstates.

If you have a different scenario, please add them in the comment

bobalong79 commented 2 years ago

Thanks @LaurentLesle we’re running from a Jenkins pipeline that uses an ephemeral rover image for the execution but we can probably find a workaround using the cached state as you mention.