datamade / how-to

📚 Doing all sorts of things, the DataMade way
MIT License
81 stars 12 forks source link

Evaluate migration to RDS for legacy apps that cannot be deployed to Heroku #206

Open hancush opened 3 years ago

hancush commented 3 years ago

When app databases update frequently and/or in a way that is not reproducible (e.g., human data entry), it is desirable to take snapshots at regular intervals so we have a version to restore in case of data loss or corruption. Let's set and document a policy for whether and how often app databases should be backed up. We should also think about a rotation policy for backups, as we probably won't need an infinite back catalog.

hancush commented 3 years ago

Once we have a policy, we should also develop a plan to implement backups for apps we maintain that need them.

hancush commented 3 years ago

@fgregg suggests evaluating whether we want to be in the business of database backups, will leave notes on pros/cons of RDS for AWS deployments and any relevant notes from Dedupe.io migration from self-managed Postgres to RDS.

hancush commented 3 years ago

Existing docs on RDS: https://github.com/datamade/how-to/blob/a43b05897bdbd200f643c5f7bb3be794a1915c5a/deployment/aws/rds.md

hancush commented 2 years ago

For this cycle, produce a lightweight recommendation of adoption containing:

hancush commented 2 years ago

I'm migrating BGA to RDS (https://github.com/datamade/bga-payroll/issues/538). Stashing a question here: I've configured one, less expensive/powerful RDS instance for staging and plan to create another more expensive/powerful RDS instance for production. However, RDS instances can contain several databases. What should be our policy here? Separate instances for staging and production, or one instance housing both databases?

hancush commented 2 years ago

FWIW, it took me about an hour to migrate the staging database, and that includes pairing with Forest to set up the first instance (learning curve), plus time to back up and restore a several-GB database.

smcalilly commented 8 months ago

However, RDS instances can contain several databases. What should be our policy here? Separate instances for staging and production, or one instance housing both databases?

For CCFP, we've setup a single instance with the staging/production environments separated by databases. This works but you have to be careful to use the correct connection string.

We did something similar for Agenda Watch on GCP. It works fine.