The current glacier archiving script was written 12+ years ago, back when the concept of "Vaults" existed. These days, glacier is just a storage tier in s3. It also uses its own sqlite database, which is not ideal since MAVEN already has a database available for us to use that gets regularly backed up.
I think we should probably stop using the vault and just back up everything to S3 glacier storage class. I don't know if it'll be easier to move things from the vault onto S3 from AWS, or if it'll be easier to re-sync everything from our server to AWS S3 again. On EMM all we do to archive the data is literally just the command aws s3 sync s3://data-sdc-emmsdc-mbrsc s3://data-sdc-emmsdc-backup --storage-class GLACIER
Once the above is complete, we'll also need to also write a script that enables us to restore files from the glacier backup. One already exists, but I suspect it uses outdated AWS APIs as well as the sqlite database.
But that said it might be worth looking into what all the code does, maybe it does some special tracking with that sqlite database that I'm missing here. It might also use the sqlite database when it comes to restoring the files.
Feel free to break this ticket down into smaller tasks as we investigate the issue more.
The current glacier archiving script was written 12+ years ago, back when the concept of "Vaults" existed. These days, glacier is just a storage tier in s3. It also uses its own sqlite database, which is not ideal since MAVEN already has a database available for us to use that gets regularly backed up.
I think we should probably stop using the vault and just back up everything to S3 glacier storage class. I don't know if it'll be easier to move things from the vault onto S3 from AWS, or if it'll be easier to re-sync everything from our server to AWS S3 again. On EMM all we do to archive the data is literally just the command
aws s3 sync s3://data-sdc-emmsdc-mbrsc s3://data-sdc-emmsdc-backup --storage-class GLACIER
Once the above is complete, we'll also need to also write a script that enables us to restore files from the glacier backup. One already exists, but I suspect it uses outdated AWS APIs as well as the sqlite database.
But that said it might be worth looking into what all the code does, maybe it does some special tracking with that sqlite database that I'm missing here. It might also use the sqlite database when it comes to restoring the files.
Feel free to break this ticket down into smaller tasks as we investigate the issue more.