department-of-veterans-affairs / va.gov-team

Public resources for building on and in support of VA.gov. Visit complete Knowledge Hub:
https://depo-platform-documentation.scrollhelp.site/index.html
277 stars 194 forks source link

Discovery Post-Postgres Upgrade: Possible data loss during database upgrade/ maintenance #68704

Open LindseySaari opened 8 months ago

LindseySaari commented 8 months ago

User Story

As the managers of the Postgres tool, We want to understand the risks around upgrades or maintenance to the tool and have a strategy documented for how said upgrades or maintenance are to be performed, So that we can avoid any data loss issues during future database upgrades and maintenance windows.

Description

The API application, which connects to an RDS database, can be facing data loss issues during database upgrades/ maintenance. When the database upgrade process is initiated, the API attempts to make calls to the database, but these calls are failing, resulting in data loss. We now have BlueGreen, and we want to do some discovery to see if this is something we want to test before the next Postgres Upgrade (date TBC).

Tasks

Confirm if there is dataloss:

Additional Information:

Proposed Actions:

Success Metrics

Acceptance Criteria

Validation

Assignee to add steps to this section. List the actions that need to be taken to confirm this issue is complete. Include any necessary links or context. State the expected outcome.

LindseySaari commented 2 months ago

Some things to explore:

  1. Run a test on a replica and see what happens during the blue/green deployment
  2. For Sidekiq: explore additional storage layers. Should we back Sidekiq by postgres?
  3. Other options, etc
Kshitiz-devops commented 1 week ago

Limitation of Postgres blue green upgrade:

PostgresSQL logical replication limitations have the following implications on blue/green deployments: Data definition language (DDL) commands are not replicated to the green environment. NEXTVAL operations on sequence objects are not incremented in the green environment. However, during switchover, Amazon RDS increments sequence values in the green environment to match those in the blue environment. Large objects (pg_largeobject) are not replicated to the green environment. Materialized views on the green environment must be independently refreshed. All tables in the blue environment must have primary keys.

More Info: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/blue-green-deployments-overview.html#blue-green-deployments-limitations-postgres

jennb33 commented 6 days ago

Per 6/26 Standup: @Kshitiz-devops has been doing testing. Wondered if we can get a continuous connection to the database, so that we can set up notifications when data We could continually hit the endpoint; Kshitiz will thread a script for next steps.

jennb33 commented 5 days ago

6/27 stand-up update: @Kshitiz-devops started testing on Wednesday and is starting to compare the update of the data sets in one of the tables.

jennb33 commented 1 day ago

7/1: Blue/Green deploys are not seeing any data loss. Ticket also includes other steps for QA systems.