Open ramirezfranciscof opened 1 year ago
Since, as of v2.0, it is possible to provide custom storage backends (as for example done by aiida-s3
) we should take into account that the method of backing up a core.psql_dos
backend is not necessarily always the correct one.
Ideally then, we would define a method on the StorageBackend
interface that creates a backup of its contents as well as a method to restore a backend from a created backup. In this way, we can have a single verdi
command that automates the entire backing up. It can provide options to backup just the storage of any profile, or backup the entire instance including configuration and log files.
One big challenge will be to have the backup/restore methods of the StorageBackend
class be performant and work whenever possible without root access. In the past, we would provide manual instructions for backing up the default storage backend since that was the most efficient, i.e., by directly going to psql
to dump the database and using rsync
for the file repository.
One big challenge will be to have the backup/restore methods of the
StorageBackend
class be performant and work whenever possible without root access.
Why do you mention this specifically? I would agree that one should try to do as much as possible without root access, but if it is necessary the user should just be prompted for password when running the command.
Why do you mention this specifically? I would agree that one should try to do as much as possible without root access, but if it is necessary the user should just be prompted for password when running the command.
For the same reason that users often experience problems using verdi quicksetup
if they don't have root access. Users on these platforms won't be able to make backups if it requires root access and they don't have it.
Yeah, good point, I forget that users may not have root access in their workstation...
Heya, I would suggest a possible alternative/complimentary solution here, is to provide functionality to "sync" backend instances. This is effectively what you are doing now when you create/import an archive (since v2 archives are effectively just an instance of a sqlite_zip backend), the limitation at the moment being that you can only create "full" archives, as opposed to having incremental updates.
If you could, for example, sync a "local" psql_dos backend with a "remote" aiida-s3) backend, then you have a backup.
This obviously relates also to https://github.com/aiidateam/aiida-core/issues/4535
In terms of also syncing, configuration and log file, that would be an open question. I think there is already an open issue(s) about including the configuration in the archive
(my suggestion ☝️ is somewhat alluded to in the initial issue, but I wanted to make it more concrete)
@eimrek @sphuber this can be closed now?
I guess the backup part is there, but it stands to be argued that restoring can be made a lot easier.
Motivation
Proper digital data management requires one to keep copies of the information in case of system failures on the main work devices. AiiDA has a well established method for transmitting information between installations by using the
verdi archive
command to export/import sets of nodes. However, even when selecting to export all nodes in the database, this may leave out information related to the configurations of the working profile. There is some documentation on creating backups, but it is somewhat convoluted and may even have become outdated since the latest modifications inaiida-core
. This means there is currently no official recommended procedure for backing up AiiDA installations.Desired Outcome
Have a clear recommended procedure for backing up and restoring full AiiDA profiles/installations. Add any features and/or utility scripts in
aiida-core
that can automate some or all of the steps, and review/update respective documentation section.Impact
All users should benefit from improved backup procedures.
Complexity
Originally creating the backup just required 3 steps:
config.json
configuration fileSince all of this was performed outside of AiiDA, it is unclear what would happen if this procedure was started while the AiiDA instance was being used (and nodes were created / modified during steps or between them, leading to inconsistent parts). Moreover, the recent changes to include the
disk-objectstore
(which added another SQLite database inside the file repository) add an extra level of complexity to live backups.We need to evaluate if we can provide a more streamlined and secure way for users to create backups, perhaps even adding new
verdi
functionalities to automate one or more of these steps in a safer manner. We must also decide if it is possible to do more modular backups (of single profiles, for example) or if it is too inconvenient to do anything other than full system installation backups.Finally, this procedure may also need to be re-structured if we implement some pull/push mechanism in the future (or replaced by it altogether).
Extra Notes
Progress