aiidateam / disk-objectstore

An implementation of an efficient "object store" (actually, a key-value store) writing files on disk and not requiring a running server
https://disk-objectstore.readthedocs.io
MIT License
16 stars 8 forks source link

backup functionality #161

Closed eimrek closed 8 months ago

eimrek commented 1 year ago

Hi @giovannipizzi @sphuber, as discussed (also in #20), I moved the disk-objectstore backup functionality from https://github.com/aiidateam/aiida-core/pull/6069 to this project.

It works currently, but I added everything currently in a separate file backup_utils.py, to keep it a bit decoupled for now. Also requires some polishing (e.g. the logging part).

However, there are some issues:

1) Code duplication w.r.t. https://github.com/aiidateam/aiida-core/pull/6069: i basically added the same utility functions to run commands on local and remote machines & to call rsync. Is this duplication fine or do you have some ideas how to change this design? 2) We say windows is supported for this repo (in the README.md), but I think relying on the unix commands breaks this. e.g. i'm using [ -e <path> ] to check if a folder exists. I used this so that I could use the same method for both local and remote paths. However, another option is to use python-specific checks for local paths (e.g. pathlib.Path.exists() for this case) and unix-specific ones for remote ones. This might be a way to keep the windows support, at least if doing backups to a local folder.

Any ideas welcome.

If this approach works, i can polish it further and add some tests.

codecov[bot] commented 1 year ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (90cd4c8) 99.89% compared to head (ac10458) 99.90%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #161 +/- ## ======================================== Coverage 99.89% 99.90% ======================================== Files 9 10 +1 Lines 1896 2066 +170 ======================================== + Hits 1894 2064 +170 Misses 2 2 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

sphuber commented 1 year ago

Thanks @eimrek . To me the concept looks fine. I haven't gone in detail over the actual intricacies of the order in which you copy data such that it allows this operation during operation. I think it would be good to document the reasoning at least somewhere (either in the docs or in the docstrings of the backup util functions) to explain why this should be safe. The CLI command help should also explicitly state if it is safe to perform while the OS is being used.

Code duplication w.r.t. https://github.com/aiidateam/aiida-core/pull/6069: i basically added the same utility functions to run commands on local and remote machines & to call rsync. Is this duplication fine or do you have some ideas how to change this design?

Sure there is duplication now, but if we were to merge the code here and make a release, aiida-core could simply reuse the disk_objectstore.backup_utils module, couldn't it? You could directly use those functions in the PsqlDosBackend.backup method, so I don't see how you would have to duplicate the code.

We say windows is supported for this repo (in the README.md), but I think relying on the unix commands breaks this. e.g. i'm using [ -e ] to check if a folder exists. I used this so that I could use the same method for both local and remote paths. However, another option is to use python-specific checks for local paths (e.g. pathlib.Path.exists() for this case) and unix-specific ones for remote ones. This might be a way to keep the windows support, at least if doing backups to a local folder.

This will definitely break, since that is a bash command, which will not run natively on windows. But even if you were to replace this and find a cross-platform solution, you are still using rsync which also doesn't natively run on Windows. So I would say that with this approach, a cross-platform solution will be very difficult. I personally would be fine to just add a disclaimer to the CLI that the backup command is Linux/MacOS only. Don't think it will be very easy to find a cross-platform solution that is as performant and I doubt that there are currently a lot of Windows users that we should cater to.

@giovannipizzi thoughts?

sphuber commented 11 months ago

@eimrek let me know if you want me to have a look at this and or have particular questions. Since you marked it as WIP I haven't gone through it yet

eimrek commented 11 months ago

@sphuber sure. i think the main features are there, i just need to add tests.

eimrek commented 11 months ago

Note after discussion with @edan-bainglass : his usecase (mounting a windows folder to a linux container and backing up there) might not support symlinks. At least in my case, where to mount a windows folder to my linux virtual machine doesn't, by default. Therefore, it might make sense to avoid using symlinks altogether.

eimrek commented 11 months ago

Hi @sphuber @giovannipizzi, this PR is in a state where your feedback would be great. the functionality is there and i added main CLI tests. But some notes/comments regarding things i'm not sure about:

eimrek commented 10 months ago

Alright, this is finally finalized. I changed the design such that there is a "metaclass" BackupManager that contains all relevant config options and utilities (e.g. calling rsync and other commands) and folder management (using correct rsync link-dest, making backup to a live folder and then moving to final destination when done, etc). The most important method is backup_auto_folders(backup_func: Callable), which argument is the actual backup function and will be different for the disk-objectstore container and aiida-core storage. Will adapt https://github.com/aiidateam/aiida-core/pull/6069 accordingly ASAP. I also added a docs page for the backups.

Any suggestions welcome :)

eimrek commented 10 months ago

@sphuber @giovannipizzi feel free to submit your review if you have a moment to check.