Develop plan for PostgreSQL migration

alangmaid commented 2 months ago

Email-msg-from-cloud.gov.pdf Expect to develop plan to implement option #1 described in email, including dropping PostgreSQL v12 databases for dev, text, perf, and beta.

maxdiebold-erg commented 3 days ago

Migration Steps

Ensure SSL is configured for all applications that will connect to the database. This configuration is currently in place in the 2.0 branches of the APIs, but it must be enabled in the CAMPD release branches.
Create a new RDS service. Use the xlarge-gp-psql plan in Staging and the xlarge-gp-psql-redundant plan in Production.

cf create-service aws-rds xlarge-gp-psql[-redundant] camd-pg-db-v15 -c '{ "storage": 1000, "enable_pg_cron": true }'

Create a service key:

cf create-service-key camd-pg-db-v15 camd-pg-db-key

Establish separate SSH tunnels to the legacy (version 12) RDS instance and the new (version 15) RDS instance.

cf ssh -NL <source-tunnel-port>:<source-rds-url>:5432 auth-api
cf ssh -NL <destination-tunnel-port>:<destination-rds-url>:5432 auth-api

Export data from the version 12 instance and load it into the version 15 instance (enter the passwords when prompted):

pg_dump -Fc --no-acl --no-owner -p <source-tunnel-port> -U <source-username> -h localhost <source-database> \
| pg_restore --clean --no-owner --no-acl -p <destination-tunnel-port> -U <destination-username> -h localhost -d <destination-database>

Connect to the postgres database in the new RDS instance with psql or another client tool and follow the instructions in section 10.2 of the runbook to finish setting up pg_cron and schedule the necessary jobs.
Coordinate with the Informatica team to disable CDC jobs before moving on the next step.
Update each application to use the new service. For each of the following Cloud.gov applications:

account-api
auth-api
camd-services
emissions-api
facilities-api
mdm-api
monitor-plan-api
quartz-scheduler
streaming-services
ssh-tunnel (?)

run the following commands:

cf bind-service <app> camd-pg-db-v15
cf unbind-service <app> camd-pg-db
cf restage <app>

Rename the RDS services:

cf rename-service camd-pg-db camd-pg-db-v12
cf rename-service camd-pg-db-v15 camd-pg-db

Coordinate with the Informatica team to enable CDC jobs on the new RDS instance.
Email Cloud.gov support to verify that automated backups are enabled on the new service.
After thorough testing against the new database, delete the old RDS instance:

cf delete-service-key camd-pg-db-v12 camd-pg-db-key
cf delete-service camd-pg-db-v12

maheese commented 1 day ago

@maxdiebold-erg Looks good. Here are few comments:

For production, the service plan needs to be "xlarge-gp-psql-redundant"
Need to insure that the new instance is created with the same parameters as the existing database. Add enable_pg_cron and additional storage. I believe we currently have 1TB allocated to production.
For production, this process will need to be coordinated with the Informatica team as there are CDC jobs running that are keeping the data up to date with the NCC. The Informatica team will need to be disable the jobs prior to the migration and then re-enable them with the new DB connection information after the migration.
Automated backups are supposed to be enabled by default in the Cloud.gov AWS RDS instances. There is no way to enable them through the broker. We could send an email to Cloud.gov support to have them check to insure automated backups are enabled.

maxdiebold-erg commented 1 day ago

@maheese Thanks, I will make those updates. I get the error "This service does not support fetching service instance parameters" when trying to view the service parameters; I know the Staging DB will need pg_cron, but should it have 1TB of storage, too?

maheese commented 1 day ago

@maxdiebold-erg I've never been able to get the view service parameters to work either. I just discussed the size of the staging DB with @mark-hayward-erg and we arrived at 1TB for staging too.

maheese commented 2 hours ago

@maxdiebold-erg I had another thought about this after talking to @mark-hayward-erg. Running this on a client machine might take a while over the SSH tunnel. We've exported data to an S3 bucket by using the apt-buildpack to load the Postgres client tools into a Cloud Foundry app and then we run the export as a Cloud Foundry task. This keeps everything in the Cloud.gov environment. I think this approach could also be used with what you've written to pipe the output of the export to import. Here's the project we developed to do this https://github.com/USEPA/cf-pg-db-tasks.

US-EPA-CAMD / easey-ui

Develop plan for PostgreSQL migration #6395

Migration Steps