boltops-tools / terraspace

Terraspace: The Terraform Framework
https://terraspace.cloud
Apache License 2.0
678 stars 46 forks source link

[BUG] Intermittent output helper destroying resources #306

Open exoaturner opened 1 year ago

exoaturner commented 1 year ago

Checklist

My Environment

Software Version
Operating System Ubuntu 22.04
Terraform 1.4.6
Terraspace 2.2.6
Ruby 3.1.2p20

Expected Behaviour

During terraspace all (plan|up) the output helper for tfvars should consistently find values in the terraform statefile.

Current Behavior

Under unknown conditions the output helper doesn't alway find existing values in the terraform statefile. Which can result in resources being destroyed unintentionally. For example if a KMS key ID defaults to a mock.

Step-by-step reproduction instructions

Run terraspace all up.

Code Sample

Picture of problem: tfvars_compiled

Unable to share the exact producible steps because its intermittent. I am also unable to share the code sample because of companies security policies. However, I have done my own investigation and found some interesting things that will help with implementing a fix, see the below.

The temporary statefiles that terraspace pull from terraform state (in /tmp/terraspace/remote_state/ directory do contain the values. This suggests that terraspace either isn't reading them correctly or isn't reading them at the correct time (possible race condition).

The below image is a snapshot of on of the values in the statefile that wasn't populated in my most recent expieriance of the problem:

As you can see the value for vpc_endpoint_ec2messages_id is in the statefile:

(I had to change the directory because of CICD) state

So this means terraspace/terraform is getting the state correctly, however, it appears terraspace is not always loading it correctly. This is inconsistent across the deployment when other stacks are using the same outputs (This is why I believe it's a race condition).

As you can see from the below code snapshot the output helps are setup correctly: tfvars_template

Solution Suggestion

I have two things to mention regarding this issue.

  1. Obviously the above is a problem and should be addressed some how (Not sure exactly).
  2. Terraspace should fail hard if mocks are detected in the compiled terraform code when deploying (after templating).
exoaturner commented 1 year ago

We have implemented a workaround thats worth sharing. Obviously this bug can still be annoying when large projects fail to deploy but this workaround will make it less likely to destroy any infrastructure by accident.

In the ./config/hooks/terraform.rb file we added some hooks for detecting mocks to stop the deployments so we don't break anything.

# Terraspace calls out to the terraform command.
# You can execute commands before and after each command with CLI hooks.
#
# See: https://terraspace.cloud/docs/config/hooks/terraform

# WARNING: A hack to stop terraspace from deploying a stack if a mock value is found.
#
# Checks for key word 'mock' because common use for mocking names
# Checks for 00000000000 because common use for mocking ids
# Checks for 10.0. because common use for mocking ips and cidrs

before("plan",
  label: 'Warn about found mocks at plan stage',
  execute: "! grep -qsEi '(mock|0000000000|10\.0\.)' *.tfvars || echo \"\033[0;33mWARNING: Found mock values in $(basename $(pwd))\033[0m\" ",
  exit_on_fail: false,
)
before("apply",
  label: 'Fail if mock values found at deploy stage',
  execute: "! grep -qsEi '(mock|0000000000|10\.0\.)' *.tfvars || \{ echo \"\033[0;31mFAILURE: Found mock values in $(basename $(pwd))\033[0m\"; false; \} ",
  exit_on_fail: true,
)

This is not a permanent fix because it will fail to deploy still.