alan-turing-institute / data-safe-haven

https://data-safe-haven.readthedocs.io
BSD 3-Clause "New" or "Revised" License
57 stars 14 forks source link

Restore backup folder #2100

Open jemrobinson opened 1 month ago

jemrobinson commented 1 month ago

:white_check_mark: Checklist

:strawberry: Suggested change

The v4 release series had a backup folder - we should do the same.

:steam_locomotive: How could this be done?

JimMadge commented 1 month ago

JSON description of a non-working backup instance.

This suggests the role assignment may be missing the necessary permissions. (I recall that there were permissions specifically associated with Azure backup).

```json { "properties": { "friendlyName": "BlobBackupSensitiveData", "dataSourceInfo": { "resourceID": "/subscriptions/3f1a8e26-eae2-4539-952a-0a6184ec248a/resourceGroups/shm-daimyo-sre-hojo-rg/providers/Microsoft.Storage/storageAccounts/shdaisrehojsensitivedata", "resourceUri": "/subscriptions/3f1a8e26-eae2-4539-952a-0a6184ec248a/resourceGroups/shm-daimyo-sre-hojo-rg/providers/Microsoft.Storage/storageAccounts/shdaisrehojsensitivedata", "datasourceType": "Microsoft.Storage/storageAccounts/blobServices", "resourceName": "shdaisrehojsensitivedata", "resourceType": "Microsoft.Storage/storageAccounts", "resourceLocation": "uksouth", "objectType": "Datasource" }, "policyInfo": { "policyId": "/subscriptions/3f1a8e26-eae2-4539-952a-0a6184ec248a/resourceGroups/shm-daimyo-sre-hojo-rg/providers/Microsoft.DataProtection/backupVaults/shm-daimyo-sre-hojo-bv-backup/backupPolicies/backup-policy-blobs" }, "protectionStatus": { "status": "ProtectionError", "errorDetails": { "message": "Appropriate permissions to perform the operation is missing.", "recommendedAction": [ "Grant appropriate permissions to perform this operation as mentioned at https://aka.ms/UserErrorMissingRequiredPermissions and retry the operation." ], "code": "UserErrorMissingRequiredPermissions", "target": "", "isRetryable": false, "isUserError": false, "properties": { "ActivityId": "dac6e9f0-196b-4a88-934b-7452a078d301" } } }, "currentProtectionState": "ProtectionError", "protectionErrorDetails": { "message": "Appropriate permissions to perform the operation is missing.", "recommendedAction": [ "Grant appropriate permissions to perform this operation as mentioned at https://aka.ms/UserErrorMissingRequiredPermissions and retry the operation." ], "code": "UserErrorMissingRequiredPermissions", "target": "", "isRetryable": false, "isUserError": false, "properties": { "ActivityId": "dac6e9f0-196b-4a88-934b-7452a078d301" } }, "provisioningState": "Succeeded", "objectType": "BackupInstance" }, "id": "/subscriptions/3f1a8e26-eae2-4539-952a-0a6184ec248a/resourceGroups/shm-daimyo-sre-hojo-rg/providers/Microsoft.DataProtection/backupVaults/shm-daimyo-sre-hojo-bv-backup/backupInstances/backup-instance-blobs", "name": "backup-instance-blobs", "type": "Microsoft.DataProtection/backupVaults/backupInstances" } ```
jemrobinson commented 1 month ago

OK, the following things are needed for backup to work (see here)

Some questions @JimMadge:

  1. Are we happy to make these changes to the storage account that has /ingress and /egress in it or would we rather do this somewhere else
  2. What do we actually want to back up? Which of /home, /ingress, /egress, /shared should we be backing up?
  3. Are we happy with running e.g. rsync daily to copy whichever subset of the above directories we want to backup? Would losing file permissions/ownership be a problem?

Depending on what we think, I'll either write something minimal that could target v5.0.0 or make a more major change that targets v5.1.0

JimMadge commented 1 month ago
  • the storage account needs to be STORAGE_V2 (not BLOCK_BLOB_STORAGE)
  • we need to disable HNS and the NFSv3 flag (not sure whether this disables NFS or not)

I think this means we cannot backup those. HNS is required for NFSv3 and I think storage v2 doesn't support NFSv3.

I think we shouldn't backup /ingress. It is read-only inside SREs and it would be better to delete all copies than to forget to delete a copy and risk it leaking.

My guess would be we want to backup,

If we are going to use a command line tool instead of Azure resources. I think we should go with something like borg which will handle encryption, de-duplication, compression.

jemrobinson commented 1 month ago

I'm suggesting using a command line tool to copy the files from a storage account that we can't back up (e.g. things we're mounting over NFS) into a storage account that we can back up.

I think we probably want the backup account to maintain the file structure of the things we're backing up, so we can easily restore single files or folders from backup. I could be convinced that it's better to store binary dumps from an archiving tool if there's a sensible restore-from-backup workflow that doesn't involve admins trying to run commands through the serial console!

JimMadge commented 1 month ago

Oh I see.

I think that would still require some manual intervention though. If we had /backup which was managed by Azure Backup Vault, we could restore that directory but would still need to propagate any roll back to /output, /shared, etc..

It feels more robust to have a one step process like borgmatic restore than click some things in the portal then run a script.

I'm sure we could have a CLI entrypoint which runs the restore commands.

jemrobinson commented 1 month ago

Here are some relevant DSPT requirements:

How does your organisation make sure that there are working backups of all important data and information?

Are backups routinely tested to make sure that data and information can be restored?

Are your backups kept separate from your network ('offline'), or in a cloud service designed for this purpose?

I think Azure Backup meets the last one, but if we use borg we would need to work out how to store these "separate from our network".

jemrobinson commented 1 month ago

duplicity might be an option. Here's a guide to backing up to Azure storage.

JimMadge commented 1 month ago

Are your backups kept separate from your network ('offline'), or in a cloud service designed for this purpose?

We should be careful with that, I think there would often be a legal obligation to not transfer the the data outside of our network.

This is one of the places where I feel that DSPT wasn't designed for TREs. I think it is talking about off site backup as in "If your building burned down, how would you make sure you don't loose everyone's medical records". However we don't expect to archive or curate data. We expect to permanently delete everything soon.

In our case, I think the equivalent of offsite is "If you tear down the workspaces and storage accounts, will you also loose the backups" and "If the datacentre burns down would you loose the backups". We could achieve that by using different resources and redundant storage.

jemrobinson commented 1 month ago

I was assuming this means that we'd need to either explicitly store backups at another datacentre location or use a very high redundancy storage account SKU.

JimMadge commented 1 month ago

Yes I think that is sensible and best practice.