cloudfoundry-community / terraform-provider-cloudfoundry

Terraform Cloud Foundry Provider
https://registry.terraform.io/providers/cloudfoundry-community/cloudfoundry/latest
Mozilla Public License 2.0
75 stars 87 forks source link

blue green strategy: add delay before deleting venerable app #569

Open Cocossoul opened 2 months ago

Cocossoul commented 2 months ago

Some apps can take a few moments to be fully up even after Cloudfoundry declared them "started" : database schema migrations, establishing connections to clients...

We had some reports of downtime because the venerable app was deleted when the new app was not fully "started" (even if Cloudfoundry was showing it started : it's not a bug in Cloudfoundry more like a practical issue on our side)

The workaround we have found is to add a preconfigured delay before killing the venerable app, and I think it can be useful to other in our case.

loafoe commented 4 weeks ago

@Cocossoul could you describe this flag in the documentation as well? Otherwise LGTM 👍🏻

Cocossoul commented 3 weeks ago

@sleungcy does the documentation I added seems alright ? I'm not really sure how to go about this, I'm open to suggestions

I'm planning on using the "workarounds" section for https://github.com/cloudfoundry-community/terraform-provider-cloudfoundry/pull/570 option too

sleungcy commented 3 weeks ago

@Cocossoul I'd say make sense to have this default to 0, and only enable for the applications needing the extra time.

However, I have one concern, the applications should not have returned a successful healthcheck until it's really ready to service traffic. If the applications returned it's correct healthcheck status, you should not have this scenario where the venerable app is destroyed before the green version of the app is ready.

The extra time each application needs will be hard to gauge. The setup time may vary between landscapes, locations, regions, and app versions. Tracking and maintaining this delay will add technical debt and overhead to the teams. Hardcoding it to large values and setting it once might seem like a solution, but it results in two copies of the app running, consuming costs and quotas. In large deployments, one application can have over 100 instances, each with 8GB of memory. Doubling this amount will significantly impact the quota available for other applications. It is preferable to delete the old app immediately once the new app is ready.