Open ben851 opened 6 months ago
I've migrated the staging database to dev and have been running tests on how best to upgrade to 11.21 and 15.x
So far, blue/green will not work because it doesn't support RDS proxy
All 0 downtime migrations require a later version of Postgres, so we would have to schedule downtime to upgrade to something that supports 0 downtime.
I'm going to run some soak tests while upgrading to 11.21 and then run a soak test upgrading to 15.x and see what the outage time difference is. We may be better off just doing a one time "big" upgrade (assuming it tests well in real staging)
We might want to either
The first option would do a better job of mimicking prod
Note that we could disable RDS Proxy during the upgrade
We should also raise this to AWS SMEs on RDS to have their take on our migration.
Here are the preliminary results of doing a straight database upgrade.
https://docs.google.com/document/d/1X3ykvlqhdfVniU8LkN9drWar1NFgWGK62xDOniEDeAI/edit
Of particular note, the upgrade from 11 to 15 took 12 minutes of downtime.
Today I'm going to look into removing the proxy in dev
I managed to do a blue/green switchover in dev using clickops. Originally it didn't look very promising, but then I realized that the initial switchover failed due to timeout. Doing it again with a longer timeout resulted in an upgrade with little downtime.
Unfortunately while AWS supports blue/green with Aurora, Terraform does not: https://github.com/hashicorp/terraform-provider-aws/blob/main/docs/design-decisions/rds-bluegreen-deployments.md
I'm going to look into what we can do about this.
I've created a set of scripts that will do the database migration. There's still some refining to do after the code freeze.
@sastels to review this PR and proceed with testing in dev.
@sastels left some suggestions on the PR - I will work on implementing those at some point.
@sastels will be doing a migration test in dev this morning.
@sastels and I worked through the first step of the migration yesterday, tracking issues. We're currently debugging why patches don't work on his system.
We will aim to do the 11.21 upgrade on Monday. We need to add the logical replication parameter to the parameter group before then so that we don't have to restart the database in the future.
Jimmy to inform notify team
Upgrade happened yesterday evening around 21h26 EST for the minor version upgrade to 11.21 with a downtime of 55 seconds. 👏🎉
Description
As a developer of notify, I would like our database to be up to date so that we can ensure we have the latest security patches and support from AWS.
WHY are we building?
We currently automatic updates disabled, so we are behind in minor versions of postgres. We need up upgrade off of 11.17 before January 16th.
WHAT are we building?
Click-ops the upgrade in dev, while running a soak test to see if there is any downtime.
VALUE created by our solution
Increased stability and security We will retain support from AWS.
Acceptance Criteria
QA Steps