cloudposse / terraform-aws-rds-cluster

Terraform module to provision an RDS Aurora cluster for MySQL or Postgres
https://cloudposse.com/accelerate
Apache License 2.0
144 stars 172 forks source link

reopen #192 #213

Closed finchr closed 4 months ago

finchr commented 5 months ago

what I implemented create_before_destroy on the aws_rds_cluster_instance default instances. Originally in #192 but that was closed for reasons we won't go into here.

why Making a change to any parameter that triggers a replace on a aws_rds_cluster_instance results in all instances being destroyed before attempting to create a new instance which causes an outage. This a faster (and safer) altenative to https://github.com/cloudposse/terraform-aws-rds-cluster/pull/191

references This closes https://github.com/cloudposse/terraform-aws-rds-cluster/issues/190 and is an alternative to https://github.com/cloudposse/terraform-aws-rds-cluster/pull/191

joe-niland commented 5 months ago

These changes were released in v1.10.0.

osterman commented 5 months ago

Sadly, looks like this triggers some Terraform bug. Have you run the terratest tests locally?

image
osterman commented 5 months ago

/terratest

finchr commented 5 months ago

Hi @osterman, I found the issue and did another push. The random_pet did not handle the enabled == false condition.

osterman commented 5 months ago

/terratest

osterman commented 4 months ago

/terratest

Benbentwo commented 4 months ago

It looks like the tests are now failing on disabled (enabled = false) should not create any resources. I believe this should be fixed by

resource "random_pet" "..." {
count = local.enabled ? 1 : 0
...
}

note you then need to update references to the random pet to use an array. such as

one(random_pet.instance[*].keepers.instance_class)
finchr commented 4 months ago

It looks like the tests are now failing on disabled (enabled = false) should not create any resources. I believe this should be fixed by

resource "random_pet" "..." {
count = local.enabled ? 1 : 0
...
}

note you then need to update references to the random pet to use an array. such as

one(random_pet.instance[*].keepers.instance_class)

Hi @Benbentwo , that makes sense and better than the hack I had in there. I just pushed a fix for this.

GabisCampana commented 4 months ago

@Benbentwo @osterman this workflow is awaiting approval: https://github.com/cloudposse/terraform-aws-rds-cluster/actions/runs/9372514860

Benbentwo commented 4 months ago

/terratest

Benbentwo commented 4 months ago

/terratest

morremeyer commented 4 months ago

Hey everyone, with the upgrade to 1.10.0, terraform would destroy and recreate all instances for all our clusters, since the identifier for aws_rds_cluster_instance.default[0] changes, which forces a replacement.

Is this intentional?

I understand this PR will prevent outages since the instance now has create_before_destroy, but I want to make sure that this is the intended behavior for the upgrade.

syphernl commented 4 months ago

Hey everyone, with the upgrade to 1.10.0, terraform would destroy and recreate all instances for all our clusters, since the identifier for aws_rds_cluster_instance.default[0] changes, which forces a replacement.

Is this intentional?

I understand this PR will prevent outages since the instance now has create_before_destroy, but I want to make sure that this is the intended behavior for the upgrade.

Based off @finchr's comment in https://github.com/cloudposse/terraform-aws-rds-cluster/pull/192#issuecomment-1971626417 it should only replace the DB Instances. It does feel a bit scary and a remark about this in the changelog would've probably been a good idea 😅

I ran this update on one of our test envs and what I see that happens is:

It looked a bit scary, as the AWS console stated it was deleting the "Writer Instance" without having the new one promoted to become a Writer first so it appears this could potentially cause a brief downtime (in regards to writing).

kevcube commented 4 months ago

It does feel a bit scary and a remark about this in the changelog would've probably been a good idea 😅

@syphernl @Benbentwo agree, while this isn't a breaking change, some sort of WARNING: line in changelog would be a good mention in the future.

But running this fully hands-off in our dev environment we did not see an interruption in our application's connection to db. So good work @finchr!

finchr commented 4 months ago

Hey everyone, with the upgrade to 1.10.0, terraform would destroy and recreate all instances for all our clusters, since the identifier for aws_rds_cluster_instance.default[0] changes, which forces a replacement.

Is this intentional?

I understand this PR will prevent outages since the instance now has create_before_destroy, but I want to make sure that this is the intended behavior for the upgrade.

Hi. @morremeyer the intent was to be able to update a cluster with near zero downtime. We ran into several scenarios where terraform was doing that anyway without the benefit of create_before_destroy. Plus this is a lot faster that in-place updates of existing nodes.