concourse / concourse-bosh-deployment

A toolchain for deploying Concourse with BOSH.
Apache License 2.0
86 stars 155 forks source link

Zero downtime deployment #167

Closed ashleystendel closed 5 years ago

ashleystendel commented 5 years ago

Feature Request

What challenge are you facing?

We currently have a large Concourse installation used by over 180 development teams (800+ developers). We have 40 workers and 6 ATCs, and an upgrade that requires recreating all VMs can take over an hour to recreate all at the same. We know we can use a small max in flight; however, the service is degraded and the update would take over a really long time. When the database is stopped/updating/migrating, the service is completely interrupted.

What would make this better?

It would be nice to have an option to update Concourse with zero or minimal downtime to minimize the impact on development teams during updates.

vito commented 5 years ago

If we had such an option it would just become the default. :slightly_smiling_face: In fact that is already the intended behavior, but zero-downtime upgrades of e.g. Postgres is a bit outside our realm of expertise (and, I would argue, responsibility).

I'm going to close this as we really don't have the resources to sink into automating a zero-downtime Postgres upgrade, but I fully agree with you in spirit. If there are Concourse-specific zero-downtime changes to suggest I'm all ears (and those could be opened as feature requests on https://github.com/concourse/concourse).