datalad / datalad-ria

Adds functionality for RIA stores to DataLad
http://datalad.org
Other
0 stars 1 forks source link

Strategies for a simpler RIA-Store & 7 Zip archive workflow #41

Open psadil opened 1 year ago

psadil commented 1 year ago

What is the problem?

This started out as a thread on neurostars.org. Now it's a feature request!

My current understanding (from this answer) is that the process for making a 7 zip archive of annexed files in a RIA store involves :

  1. Pushing all new annexed files to the store,
  2. Making an archive of annexed files, and then
  3. Removing the unarchived copies of the files.

For safety, these steps need to be done while also manually managing locks and ensuring that annexed files are synchronized.

That works, but it would be convenient if there were a way to interact with a RIA store without knowledge of it's internal structure -- to have the annex components be backed by the archive only.

What steps will reproduce the problem?

Create store

ria=/somepath
alias=mydata
datalad create-sibling-ria -s ria-backup --alias ${alias} --new-store-ok "ria+file://${ria}"

Now, when pushing annexed content to the new store, I would like for the ability to have DataLad automagically archive the annexed files into a 7 Zipped archive.7z.

Here are a couple of ways that could look

DataLad information

❯ datalad --version
datalad 0.18.3

# operating system: darwin aarch64

❯ git annex version
git-annex version: 10.20230407
build flags: Assistant Webapp Pairing FsEvents TorrentParser MagicMime Benchmark Feeds Testsuite S3 WebDAV
dependency versions: aws-0.24 bloomfilter-2.0.1.0 cryptonite-0.30 DAV-1.3.4 feed-1.3.2.1 ghc-9.4.4 http-client-0.7.13.1 persistent-sqlite-2.13.1.1 torrent-10000.1.3 uuid-1.3.15 yesod-1.6.2.1
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL X*
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg hook external
operating system: darwin aarch64
supported repository versions: 8 9 10
upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10
local repository version: 10

Additional context

datalad/datalad#5059 is related, but I think it had a slightly different target (and was closed without changes).

Have you had any success using DataLad before?

Yes! But the 7 Zip archive process felt awkward and so I left files unarchived.

adswa commented 1 year ago

Hey @psadil, sorry for the long silence here! Your ideas are interesting. We're busy with a different development project until the end of May, though, so there is too little time for us this month to discuss or act on this issue. However, as a heads-up, we have scheduled a ria-themed sprint for the month of June, and added this issue to our agenda - thanks for opening it!

psadil commented 1 year ago

@adswa , thanks for the update!