Open jemrobinson opened 1 month ago
JSON description of a non-working backup instance.
This suggests the role assignment may be missing the necessary permissions. (I recall that there were permissions specifically associated with Azure backup).
OK, the following things are needed for backup to work (see here)
Storage Account Backup Contributor
permissions on the storage accountSTORAGE_V2
(not BLOCK_BLOB_STORAGE
)Some questions @JimMadge:
rsync
daily to copy whichever subset of the above directories we want to backup? Would losing file permissions/ownership be a problem?Depending on what we think, I'll either write something minimal that could target v5.0.0 or make a more major change that targets v5.1.0
- the storage account needs to be STORAGE_V2 (not BLOCK_BLOB_STORAGE)
- we need to disable HNS and the NFSv3 flag (not sure whether this disables NFS or not)
I think this means we cannot backup those. HNS is required for NFSv3 and I think storage v2 doesn't support NFSv3.
I think we shouldn't backup /ingress
. It is read-only inside SREs and it would be better to delete all copies than to forget to delete a copy and risk it leaking.
My guess would be we want to backup,
/shared
/egress
/home
If we are going to use a command line tool instead of Azure resources. I think we should go with something like borg which will handle encryption, de-duplication, compression.
I'm suggesting using a command line tool to copy the files from a storage account that we can't back up (e.g. things we're mounting over NFS) into a storage account that we can back up.
I think we probably want the backup account to maintain the file structure of the things we're backing up, so we can easily restore single files or folders from backup. I could be convinced that it's better to store binary dumps from an archiving tool if there's a sensible restore-from-backup workflow that doesn't involve admins trying to run commands through the serial console!
Oh I see.
I think that would still require some manual intervention though. If we had /backup
which was managed by Azure Backup Vault, we could restore that directory but would still need to propagate any roll back to /output
, /shared
, etc..
It feels more robust to have a one step process like borgmatic restore
than click some things in the portal then run a script.
I'm sure we could have a CLI entrypoint which runs the restore commands.
Here are some relevant DSPT requirements:
How does your organisation make sure that there are working backups of all important data and information?
Are backups routinely tested to make sure that data and information can be restored?
Are your backups kept separate from your network ('offline'), or in a cloud service designed for this purpose?
I think Azure Backup meets the last one, but if we use borg
we would need to work out how to store these "separate from our network".
Are your backups kept separate from your network ('offline'), or in a cloud service designed for this purpose?
We should be careful with that, I think there would often be a legal obligation to not transfer the the data outside of our network.
This is one of the places where I feel that DSPT wasn't designed for TREs. I think it is talking about off site backup as in "If your building burned down, how would you make sure you don't loose everyone's medical records". However we don't expect to archive or curate data. We expect to permanently delete everything soon.
In our case, I think the equivalent of offsite is "If you tear down the workspaces and storage accounts, will you also loose the backups" and "If the datacentre burns down would you loose the backups". We could achieve that by using different resources and redundant storage.
I was assuming this means that we'd need to either explicitly store backups at another datacentre location or use a very high redundancy storage account SKU.
Yes I think that is sensible and best practice.
:white_check_mark: Checklist
:strawberry: Suggested change
The v4 release series had a
backup
folder - we should do the same.:steam_locomotive: How could this be done?