gigascience / gigadb-website

Source code for running GigaDB
http://gigadb.org
GNU General Public License v3.0
9 stars 15 forks source link

High availability of database service: cross AZ resilience and off-site backup #1011

Open rija opened 2 years ago

rija commented 2 years ago

User Story

As a developer I want a standard operating procedure to restore database on AWS infrastructure in the Upstream Gitlab group So that I can perform database restoration in case of database failure for GigaDB website

Acceptance criteria

Given the RDS hosting the database has failed When I perform the SOP Then the database is deployed in RDS in another AWS availability zone

Given the RDS backups of GigaDB database are not available When I perform the SOP Then the database is restored from backups stored in another location

Addtional information

If a region or availability zone goes down, it's not just the database, it's the whole stack deployed in that AZ.

0) transfer gigadb.org from Alicloud to Route53

1) TODO in case of region failure:

2) TODO in case of AZ failure (within a region):

We keep database backups also on S3 which is independent from any region/AZ

rija commented 1 year ago

Hi @pli888, @kencho51,

It is now easy to do that through the console: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.MultiAZSingleStandby.html

We should still look into how to do this through Terraform

rija commented 1 year ago

Hi @pli888, @kencho51,

We should also consider deploying read replicas: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PostgreSQL.Replication.ReadReplicas.html

We still need to look into how to do that in Terraform (automation)

but that should be another user story, not here.