department-of-veterans-affairs / va.gov-team

Public resources for building on and in support of VA.gov. Visit complete Knowledge Hub:
https://depo-platform-documentation.scrollhelp.site/index.html
283 stars 204 forks source link

Discovery Post-Postgres Upgrade: Possible data loss during database upgrade/ maintenance #68704

Closed LindseySaari closed 4 months ago

LindseySaari commented 1 year ago

User Story

As the managers of the Postgres tool, We want to understand the risks around upgrades or maintenance to the tool and have a strategy documented for how said upgrades or maintenance are to be performed, So that we can avoid any data loss issues during future database upgrades and maintenance windows.

Description

The API application, which connects to an RDS database, can be facing data loss issues during database upgrades/ maintenance. When the database upgrade process is initiated, the API attempts to make calls to the database, but these calls are failing, resulting in data loss. We now have BlueGreen, and we want to do some discovery to see if this is something we want to test before the next Postgres Upgrade (date TBC).

Tasks

Confirm if there is dataloss:

Additional Information:

Proposed Actions:

Success Metrics

Acceptance Criteria

Validation

Assignee to add steps to this section. List the actions that need to be taken to confirm this issue is complete. Include any necessary links or context. State the expected outcome.

LindseySaari commented 6 months ago

Some things to explore:

  1. Run a test on a replica and see what happens during the blue/green deployment
  2. For Sidekiq: explore additional storage layers. Should we back Sidekiq by postgres?
  3. Other options, etc
Kshitiz-devops commented 4 months ago

Limitation of Postgres blue green upgrade:

PostgresSQL logical replication limitations have the following implications on blue/green deployments: Data definition language (DDL) commands are not replicated to the green environment. NEXTVAL operations on sequence objects are not incremented in the green environment. However, during switchover, Amazon RDS increments sequence values in the green environment to match those in the blue environment. Large objects (pg_largeobject) are not replicated to the green environment. Materialized views on the green environment must be independently refreshed. All tables in the blue environment must have primary keys.

More Info: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/blue-green-deployments-overview.html#blue-green-deployments-limitations-postgres

jennb33 commented 4 months ago

Per 6/26 Standup: @Kshitiz-devops has been doing testing. Wondered if we can get a continuous connection to the database, so that we can set up notifications when data We could continually hit the endpoint; Kshitiz will thread a script for next steps.

jennb33 commented 4 months ago

6/27 stand-up update: @Kshitiz-devops started testing on Wednesday and is starting to compare the update of the data sets in one of the tables.

jennb33 commented 4 months ago

7/1: Blue/Green deploys are not seeing any data loss. Ticket also includes other steps for QA systems.

Kshitiz-devops commented 4 months ago
Feature/Aspect RabbitMQ Amazon SQS Apache Kafka
Protocol Support AMQP, STOMP, MQTT AWS-specific Kafka Protocol
Message Delivery Reliable with ACKs At-least-once, FIFO for exactly-once At-least-once, exactly-once
Routing Advanced routing (exchanges) Basic queue mechanism Partition-based
Management Web UI, plugins Managed by AWS Requires external tools
Setup Moderate complexity Easy (managed service) High complexity
Scalability Moderate to high High Very high
Latency Low Low (variable) Low
Throughput Moderate to high High Very high
Integration Good library support, more setup Excellent with AWS, easy integration Good library support, complex setup

RabbitMQ is suitable if we need a flexible, protocol-agnostic message broker with advanced routing capabilities and moderate to high throughput. Amazon SQS is ideal if we prefer a fully managed service with high scalability, easy integration with AWS, and minimal setup overhead. Apache Kafka is the best choice if we need to handle very high throughput, long-term message storage, and integration with big data ecosystems, but are prepared to manage a more complex setup.

But since we don't have any dataloss because of the blue green deployment do we need any messaging service?