DataBiosphere / azul

Metadata indexer and query service used for AnVIL, HCA, LungMAP, and CGP
Apache License 2.0
7 stars 2 forks source link

Backup important artifacts into another AWS region #4522

Open theathorn opened 2 years ago

theathorn commented 2 years ago

Backup

The CloudWatch logs are exported (#3791, #4314) to the vault directly and no additional work is needed for them in this ticket.

melainalegaspi commented 2 years ago

@hannes-ucsc to come up with design.

hannes-ucsc commented 1 year ago

We'll be using cross-account backups in AWS Backup for this. It supports backing up EBS volumes and S3 buckets, among other services.

The cross-account setup is necessary to prevent accidental deletion, or manipulation of backups. The backup will run periodically, once a week. This requires AWS Organizations and the designation of a source and a target account. The source accounts are the account that host the system, that target account will be out of our control. For that reason, an organization admin will need to set this up, monitor and maintain it.

We'd have to test how EBS volume backups work with AWS Backup: Instead of backing up the volume, I'd rather back-up the snapshots that an operator creates before performing a GitLab update but I have not found any documentaton on backing up specific snapshots instead of volumes. I'm sure the volume backup will involve creating a snapshot but that may be transparent to us. It's also unclear to me how AWS Backup can ensure that the volume can be consistent when it is backed up while being attached to a running instance.

Important: The setup of vault accounts and buckets is defined in https://github.com/databiosphere/azul/issues/4314.

hannes-ucsc commented 1 year ago

Spike again to test backing up buckets and EBS volumes in dev, minus the cross-account setup.