Ideas for skipping cleanup on error?

GoogleCloudPlatform / terraform-python-testing-helper

Simple Python test helper for Terraform.

https://pypi.org/project/tftest/

Apache License 2.0

214 stars 31 forks source link

Ideas for skipping cleanup on error? #41

Open lorengordon opened 2 years ago

lorengordon commented 2 years ago

Just ran an apply/destroy cycle through tftest, where on destroy I ran into a timing error when waiting for a resource to change to the expected state:

Error: Error waiting for VPC Peering Connection (pcx-....) to be deleted: timeout while waiting for state to become 'rejected, deleted' (last state: 'active', timeout: 1m0s)

Ok, not great, but no big deal, just go into the test directory and re-run destroy... Except, the tfstate file is gone, so can't do that!

lorengordon commented 2 years ago

For now, since we always run apply/destroy in our current usage, I'm gonna use setup(upgrade=True, cleanup_on_exit=False)... That way the .terraform directory will always be refreshed, and with a working destroy the tfstate file should always be empty anyway...

ludoo commented 2 years ago

I was going to suggest that. If you get a better idea let's see if it can be implemented.

lorengordon commented 2 years ago

I'm not super familiar with the _finalizer thing, which I assume is a feature of pytest? Right now, it is only configured in the setup method, and the other methods do not influence it...

grahamhar commented 2 years ago

We wrap the setup apply and destroy in a fixture:

@pytest.fixture(scope='module')
def output(scenarios_dir, tf_vars, scenario):
    print(f'Setup starting for {scenario}')
    tf = tftest.TerraformTest(scenario, scenarios_dir)
    tf.setup()
    try:
        tf.apply(tf_vars=tf_vars)
        print(f'setup complete for {scenario}')
        yield tf.output()
        tf.destroy(**{'auto_approve': True}, tf_vars=tf_vars)
    except Exception as exc:
        print(f'Issue during apply, attempting to clean up {exc}')
        tf.destroy(**{'auto_approve': True}, tf_vars=tf_vars)

It isn't perfect, but effectively puts a retry on destroy if a transient error occurs.

ludoo commented 2 years ago

Ach, completely forgot to answer this, sorry!

@grahamhar I'm struggling to understand your solution (my fault ofc): setup() has cleanup_on_exit defaulting to True, so cleanup of state files in the finalizer happens anyway?

IMHO a simple way of avoiding the above is of course to pass cleanup_on_exit=False to setup(), then trapping the potential exception raised during destroy as in Graham's example, and cleaning up state manually only if destroy succeeds.

grahamhar commented 2 years ago

@ludoo Sorry I probably should have provided more explanation, let me provide a few examples of how I think the approach I suggested helps.

Issue with Terraform code under test that results in a failure under the apply phase, this captures the exception thrown from the apply and runs a destroy which should clean up any resources already created. As Terraform in many cases doesn't validate parameters it's easy to get 400 type response back which halt execution leaving some resources created.
Issue in Destroy caused by a timeout, generally when this happens a second run of destroy would rectify this, not perfect but we've found this to be the case.
Occasionally we get race conditions like a policy on an S3 bucket trying to be deleted while another S3 operation is underway, again the second attempt in the except block normally resolves this.

I agree it is probably a good idea to set cleanup_on_exit to False.

lorengordon commented 2 years ago

@grahamhar With that config, what does pytest report if the apply succeeds, the first destroy fails, and the second destroy succeeds? Is it still a test pass (I would think so...)?

grahamhar commented 2 years ago

Yes you get a test pass which in our use case is what we want.

If that is not what you want add a pytest.fail to the except block