reopen #192 - Githubissues

finchr commented 5 months ago

what I implemented create_before_destroy on the aws_rds_cluster_instance default instances. Originally in #192 but that was closed for reasons we won't go into here.

why Making a change to any parameter that triggers a replace on a aws_rds_cluster_instance results in all instances being destroyed before attempting to create a new instance which causes an outage. This a faster (and safer) altenative to https://github.com/cloudposse/terraform-aws-rds-cluster/pull/191

references This closes https://github.com/cloudposse/terraform-aws-rds-cluster/issues/190 and is an alternative to https://github.com/cloudposse/terraform-aws-rds-cluster/pull/191

joe-niland commented 5 months ago

These changes were released in v1.10.0.

osterman commented 5 months ago

Sadly, looks like this triggers some Terraform bug. Have you run the terratest tests locally?

osterman commented 5 months ago

/terratest

finchr commented 5 months ago

Hi @osterman, I found the issue and did another push. The random_pet did not handle the enabled == false condition.

osterman commented 5 months ago

/terratest

osterman commented 4 months ago

/terratest

Benbentwo commented 4 months ago

It looks like the tests are now failing on disabled (enabled = false) should not create any resources. I believe this should be fixed by

resource "random_pet" "..." {
count = local.enabled ? 1 : 0
...
}

note you then need to update references to the random pet to use an array. such as

one(random_pet.instance[*].keepers.instance_class)

finchr commented 4 months ago

It looks like the tests are now failing on disabled (enabled = false) should not create any resources. I believe this should be fixed by
resource "random_pet" "..." {
count = local.enabled ? 1 : 0
...
}
note you then need to update references to the random pet to use an array. such as
one(random_pet.instance[*].keepers.instance_class)

Hi @Benbentwo , that makes sense and better than the hack I had in there. I just pushed a fix for this.

GabisCampana commented 4 months ago

@Benbentwo @osterman this workflow is awaiting approval: https://github.com/cloudposse/terraform-aws-rds-cluster/actions/runs/9372514860

Benbentwo commented 4 months ago

/terratest

Benbentwo commented 4 months ago

/terratest

morremeyer commented 4 months ago

Hey everyone, with the upgrade to 1.10.0, terraform would destroy and recreate all instances for all our clusters, since the identifier for aws_rds_cluster_instance.default[0] changes, which forces a replacement.

Is this intentional?

I understand this PR will prevent outages since the instance now has create_before_destroy, but I want to make sure that this is the intended behavior for the upgrade.

syphernl commented 4 months ago

Hey everyone, with the upgrade to 1.10.0, terraform would destroy and recreate all instances for all our clusters, since the identifier for aws_rds_cluster_instance.default[0] changes, which forces a replacement.

Is this intentional?

I understand this PR will prevent outages since the instance now has create_before_destroy, but I want to make sure that this is the intended behavior for the upgrade.

Based off @finchr's comment in https://github.com/cloudposse/terraform-aws-rds-cluster/pull/192#issuecomment-1971626417 it should only replace the DB Instances. It does feel a bit scary and a remark about this in the changelog would've probably been a good idea 😅

I ran this update on one of our test envs and what I see that happens is:

New instance gets created (as Reader Instance)
Existing instance (Writer Instance) is being kept during that process
The old instance is deleted
The new instance is promoted to Writer Instance (this happens during the delete process)

It looked a bit scary, as the AWS console stated it was deleting the "Writer Instance" without having the new one promoted to become a Writer first so it appears this could potentially cause a brief downtime (in regards to writing).

kevcube commented 4 months ago

It does feel a bit scary and a remark about this in the changelog would've probably been a good idea 😅

@syphernl @Benbentwo agree, while this isn't a breaking change, some sort of WARNING: line in changelog would be a good mention in the future.

But running this fully hands-off in our dev environment we did not see an interruption in our application's connection to db. So good work @finchr!

finchr commented 4 months ago

Hey everyone, with the upgrade to 1.10.0, terraform would destroy and recreate all instances for all our clusters, since the identifier for aws_rds_cluster_instance.default[0] changes, which forces a replacement.

Is this intentional?

I understand this PR will prevent outages since the instance now has create_before_destroy, but I want to make sure that this is the intended behavior for the upgrade.

Hi. @morremeyer the intent was to be able to update a cluster with near zero downtime. We ran into several scenarios where terraform was doing that anyway without the benefit of create_before_destroy. Plus this is a lot faster that in-place updates of existing nodes.

cloudposse / terraform-aws-rds-cluster

reopen #192 #213