MAVENSDC / maven-sdc

This contains the back-end code to the MAVEN SDC data server
0 stars 0 forks source link

Revamp the AWS Glacier archiving #60

Open bryan-harter opened 3 days ago

bryan-harter commented 3 days ago

The current glacier archiving script was written 12+ years ago, back when the concept of "Vaults" existed. These days, glacier is just a storage tier in s3. It also uses its own sqlite database, which is not ideal since MAVEN already has a database available for us to use that gets regularly backed up.

I think we should probably stop using the vault and just back up everything to S3 glacier storage class. I don't know if it'll be easier to move things from the vault onto S3 from AWS, or if it'll be easier to re-sync everything from our server to AWS S3 again. On EMM all we do to archive the data is literally just the command aws s3 sync s3://data-sdc-emmsdc-mbrsc s3://data-sdc-emmsdc-backup --storage-class GLACIER

Once the above is complete, we'll also need to also write a script that enables us to restore files from the glacier backup. One already exists, but I suspect it uses outdated AWS APIs as well as the sqlite database.

But that said it might be worth looking into what all the code does, maybe it does some special tracking with that sqlite database that I'm missing here. It might also use the sqlite database when it comes to restoring the files.

Feel free to break this ticket down into smaller tasks as we investigate the issue more.