Restore backup folder - Githubissues

jemrobinson commented 1 month ago

:white_check_mark: Checklist

[x] I have searched open and closed issues for duplicates.
[x] This is a request for a new feature in the Data Safe Haven or an upgrade to an existing feature.
[x] The feature is still missing in the latest version.
[x] I have read through the documentation.
[x] This isn't an open-ended question (open a discussion if it is).

:strawberry: Suggested change

The v4 release series had a backup folder - we should do the same.

:steam_locomotive: How could this be done?

JimMadge commented 1 month ago

JSON description of a non-working backup instance.

This suggests the role assignment may be missing the necessary permissions. (I recall that there were permissions specifically associated with Azure backup).

```json { "properties": { "friendlyName": "BlobBackupSensitiveData", "dataSourceInfo": { "resourceID": "/subscriptions/3f1a8e26-eae2-4539-952a-0a6184ec248a/resourceGroups/shm-daimyo-sre-hojo-rg/providers/Microsoft.Storage/storageAccounts/shdaisrehojsensitivedata", "resourceUri": "/subscriptions/3f1a8e26-eae2-4539-952a-0a6184ec248a/resourceGroups/shm-daimyo-sre-hojo-rg/providers/Microsoft.Storage/storageAccounts/shdaisrehojsensitivedata", "datasourceType": "Microsoft.Storage/storageAccounts/blobServices", "resourceName": "shdaisrehojsensitivedata", "resourceType": "Microsoft.Storage/storageAccounts", "resourceLocation": "uksouth", "objectType": "Datasource" }, "policyInfo": { "policyId": "/subscriptions/3f1a8e26-eae2-4539-952a-0a6184ec248a/resourceGroups/shm-daimyo-sre-hojo-rg/providers/Microsoft.DataProtection/backupVaults/shm-daimyo-sre-hojo-bv-backup/backupPolicies/backup-policy-blobs" }, "protectionStatus": { "status": "ProtectionError", "errorDetails": { "message": "Appropriate permissions to perform the operation is missing.", "recommendedAction": [ "Grant appropriate permissions to perform this operation as mentioned at https://aka.ms/UserErrorMissingRequiredPermissions and retry the operation." ], "code": "UserErrorMissingRequiredPermissions", "target": "", "isRetryable": false, "isUserError": false, "properties": { "ActivityId": "dac6e9f0-196b-4a88-934b-7452a078d301" } } }, "currentProtectionState": "ProtectionError", "protectionErrorDetails": { "message": "Appropriate permissions to perform the operation is missing.", "recommendedAction": [ "Grant appropriate permissions to perform this operation as mentioned at https://aka.ms/UserErrorMissingRequiredPermissions and retry the operation." ], "code": "UserErrorMissingRequiredPermissions", "target": "", "isRetryable": false, "isUserError": false, "properties": { "ActivityId": "dac6e9f0-196b-4a88-934b-7452a078d301" } }, "provisioningState": "Succeeded", "objectType": "BackupInstance" }, "id": "/subscriptions/3f1a8e26-eae2-4539-952a-0a6184ec248a/resourceGroups/shm-daimyo-sre-hojo-rg/providers/Microsoft.DataProtection/backupVaults/shm-daimyo-sre-hojo-bv-backup/backupInstances/backup-instance-blobs", "name": "backup-instance-blobs", "type": "Microsoft.DataProtection/backupVaults/backupInstances" } ```

jemrobinson commented 1 month ago

OK, the following things are needed for backup to work (see here)

the backup vault needs Storage Account Backup Contributor permissions on the storage account
the storage account needs to be STORAGE_V2 (not BLOCK_BLOB_STORAGE)
we need to disable HNS and the NFSv3 flag (not sure whether this disables NFS or not)
we can't use PREMIUM_ZRS (but STANDARD_GRS seems to work).

Some questions @JimMadge:

Are we happy to make these changes to the storage account that has /ingress and /egress in it or would we rather do this somewhere else
What do we actually want to back up? Which of /home, /ingress, /egress, /shared should we be backing up?
Are we happy with running e.g. rsync daily to copy whichever subset of the above directories we want to backup? Would losing file permissions/ownership be a problem?

Depending on what we think, I'll either write something minimal that could target v5.0.0 or make a more major change that targets v5.1.0

JimMadge commented 1 month ago

the storage account needs to be STORAGE_V2 (not BLOCK_BLOB_STORAGE)

we need to disable HNS and the NFSv3 flag (not sure whether this disables NFS or not)

I think this means we cannot backup those. HNS is required for NFSv3 and I think storage v2 doesn't support NFSv3.

I think we shouldn't backup /ingress. It is read-only inside SREs and it would be better to delete all copies than to forget to delete a copy and risk it leaking.

My guess would be we want to backup,

/shared
/egress
probably/possibly /home

If we are going to use a command line tool instead of Azure resources. I think we should go with something like borg which will handle encryption, de-duplication, compression.

jemrobinson commented 1 month ago

I'm suggesting using a command line tool to copy the files from a storage account that we can't back up (e.g. things we're mounting over NFS) into a storage account that we can back up.

I think we probably want the backup account to maintain the file structure of the things we're backing up, so we can easily restore single files or folders from backup. I could be convinced that it's better to store binary dumps from an archiving tool if there's a sensible restore-from-backup workflow that doesn't involve admins trying to run commands through the serial console!

JimMadge commented 1 month ago

Oh I see.

I think that would still require some manual intervention though. If we had /backup which was managed by Azure Backup Vault, we could restore that directory but would still need to propagate any roll back to /output, /shared, etc..

It feels more robust to have a one step process like borgmatic restore than click some things in the portal then run a script.

I'm sure we could have a CLI entrypoint which runs the restore commands.

jemrobinson commented 1 month ago

Here are some relevant DSPT requirements:

How does your organisation make sure that there are working backups of all important data and information?

Are backups routinely tested to make sure that data and information can be restored?

Are your backups kept separate from your network ('offline'), or in a cloud service designed for this purpose?

I think Azure Backup meets the last one, but if we use borg we would need to work out how to store these "separate from our network".

jemrobinson commented 1 month ago

duplicity might be an option. Here's a guide to backing up to Azure storage.

JimMadge commented 1 month ago

Are your backups kept separate from your network ('offline'), or in a cloud service designed for this purpose?

We should be careful with that, I think there would often be a legal obligation to not transfer the the data outside of our network.

This is one of the places where I feel that DSPT wasn't designed for TREs. I think it is talking about off site backup as in "If your building burned down, how would you make sure you don't loose everyone's medical records". However we don't expect to archive or curate data. We expect to permanently delete everything soon.

In our case, I think the equivalent of offsite is "If you tear down the workspaces and storage accounts, will you also loose the backups" and "If the datacentre burns down would you loose the backups". We could achieve that by using different resources and redundant storage.

jemrobinson commented 1 month ago

I was assuming this means that we'd need to either explicitly store backups at another datacentre location or use a very high redundancy storage account SKU.

JimMadge commented 1 month ago

Yes I think that is sensible and best practice.

alan-turing-institute / data-safe-haven

Restore backup folder #2100

:white_check_mark: Checklist

:strawberry: Suggested change

:steam_locomotive: How could this be done?