denshoproject / ddr-cmdln

Command-line tools for automating the Densho Digital Repository's various processes.
Other
0 stars 2 forks source link

Research replacement for git-annex b2 plugin #203

Open GeoffFroh opened 3 years ago

GeoffFroh commented 3 years ago

The DDR backup workflow uses the git-annex-remote-b2 plugin for git-annex sync and git-annex copy --to b2 operations, allowing us to create special remotes directly on b2. The plugin (https://github.com/encryptio/git-annex-remote-b2) is fully compatible with the current version of git-annex and hasn't been an issue; but has not been updated in quite a while. Therefore, it would be a good idea to start researching a different b2 integration before something breaks.

One possibility is to use the rclone plugin instead. @gjost will do the initial research; then we can do full testing/implementation when there is time in the schedule.

See also: https://spaces.densho.org/pages/viewpage.action?spaceKey=DR&title=DDR+Data+Protection+and+Backup See also: https://git-annex.branchable.com/tips/using_Backblaze_B2/

gjost commented 3 years ago

Debian has packages for rclone and git-annex-remote-rclone (https://github.com/DanielDent/git-annex-remote-rclone). Installing them is a simple addition to ddr-cmdln's Makefile. The semantics of initializing a special remote are similar but slightly different:

Existing b2: git annex initremote REMOTENAME type=external externaltype=b2 bucket=BUCKETNAME prefix=DDR_ID encryption=none Rclone: git annex initremote REMOTENAME type=external externaltype=rclone target=BUCKETNAME prefix=DDR_ID chunk=50MiB encryption=shared mac=HMACSHA512 rclone_layout=LAYOUT

IMPORTANT We have a lot of existing special remotes - we don't want to recopy them and we especially don't want to have duplicate files taking up twice the space. Rclone offers five different methods for determining filenames and directory structures, and encryption and chunk size are probably important too. It's important that we figure out an Rclone setup that produces the exact same file layout as our existing one.

gjost commented 3 years ago

Note that an Rclone version of the ddrpubcopy ... --b2 code is not part of this issue's scope. Converting that to use Rclone would mean writing an alternative to DDR.storage.Backblaze but is not what we're doing here.