datalad / datalad-ria

Adds functionality for RIA stores to DataLad
http://datalad.org
Other
0 stars 1 forks source link

RIA siblings cannot deal with auto-renaming of remotes #47

Open mlell opened 1 year ago

mlell commented 1 year ago

What is the problem?

When a dataset is cloned from a RIA store and that clone is cloned again, the storage sibling of the RIA store is not named correctly and probably cannot be enabled... I think this is because datalad does not expect that git annex auto-renames remotes if the remote has itself a remote of the same name.

What steps will reproduce the problem?

$  datalad create a
[INFO   ] Creating a new annex repo at /qg-10/data/AGR-QG/lell/test/datalad-chai               n/a
create(ok): /qg-10/data/AGR-QG/lell/test/datalad-chain/a (dataset)

$ cd a

$ datalad create-sibling-ria -s origin ria+file://$PWD/../ria                            [INFO   ] create siblings 'origin' and 'origin-storage' ...
[INFO   ] Fetching updates for Dataset(/qg-10/data/AGR-QG/lell/test/datalad-chain/a)
[INFO   ] Configure additional publication dependency on "origin-storage"
create-sibling-ria(ok): /qg-10/data/AGR-QG/lell/test/datalad-chain/a (dataset)

$ cd ..

$ datalad clone a b
[INFO   ] Fetching updates for Dataset(/qg-10/data/AGR-QG/lell/test/datalad-chain/b)
[INFO   ] Could not enable annex remote origin-2. This is expected if origin-2 is a pure Git remote, or happens if it is not accessible.
[WARNING] Could not detect whether origin-2 carries an annex. If origin-2 is a pure Git remote, this is expected.
update(ok): . (dataset)
configure-sibling(ok): . (sibling)
install(ok): /qg-10/data/AGR-QG/lell/test/datalad-chain/b (dataset)
action summary:
  configure-sibling (ok: 1)
  install (ok: 1)
  update (ok: 1)

DataLad information

``` $ datalad wtf /data/public/datalad/datalad-venv/lib64/python3.6/site-packages/secretstorage/util.py:23: CryptographyDeprecationWarning: Python 3.6 is no longer supported by the Python core team. Therefore, support for it is deprecated in cryptography. The next release of cryptography (40.0) will be the last to support Python 3.6. from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes # WTF ## configuration ## credentials - keyring: - active_backends: - PlaintextKeyring with no encyption v.1.0 at /home/lell/.local/share/python_keyring/keyring_pass.cfg - config_file: /home/lell/.config/python_keyring/keyringrc.cfg - data_root: /home/lell/.local/share/python_keyring ## datalad - version: 0.15.6 ## dataset - branches: - git-annex@eb53c5b - master@c36a9c4 - id: e4085857-d5e5-4c2b-b44e-d2ce0adddb0d - metadata: - path: /data/lell/test/datalad-chain/b - repo: AnnexRepo ## dependencies - annexremote: 1.6.0 - appdirs: 1.4.4 - boto: 2.49.0 - cmd:annex: 8.20211118-g23ee48898 - cmd:bundled-git: UNKNOWN - cmd:git: 2.34.0 - cmd:system-git: 2.34.0 - cmd:system-ssh: 7.4p1 - humanize: 3.14.0 - iso8601: 1.1.0 - keyring: 23.4.1 - keyrings.alt: 4.1.0 - msgpack: 1.0.4 - requests: 2.27.1 - wrapt: 1.14.1 ## environment - GIT_ANNEX_APP_BASE: /data/public/git-annex - GIT_ANNEX_DIR: /data/public/git-annex - GIT_ANNEX_LD_LIBRARY_PATH: /data/public/git-annex//lib/x86_64-linux-gnu: - GIT_ANNEX_STANDLONE_ENV: PATH GCONV_PATH GIT_EXEC_PATH GIT_TEMPLATE_DIR MANPATH LOCPATH - GIT_BRANCH: master - GIT_EXEC_PATH: /data/public/git-annex/git-core - GIT_TEMPLATE_DIR: /data/public/git-annex/templates - LANG: en_US.UTF-8 - PATH: /data/public/datalad/datalad-venv/bin:/data/public/git-annex/bin:/data/lell/.bin:/home/lell/.local/bin:/data/lell/.local/bin:/data/lell/.usr/bin:/filer/agruppen/qg/Computing-Cluster/scripts:/home/lell/perl5/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/local/bin:/data/public/git-annex/extra:/usr/local/bin ## extensions ## git-annex - build flags: - Assistant - Webapp - Pairing - Inotify - DBus - DesktopNotify - TorrentParser - MagicMime - Feeds - Testsuite - S3 - WebDAV - dependency versions: - aws-0.22 - bloomfilter-2.0.1.0 - cryptonite-0.26 - DAV-1.3.4 - feed-1.3.0.1 - ghc-8.8.4 - http-client-0.6.4.1 - persistent-sqlite-2.10.6.2 - torrent-10000.1.1 - uuid-1.3.13 - yesod-1.6.1.0 - key/value backends: - SHA256E - SHA256 - SHA512E - SHA512 - SHA224E - SHA224 - SHA384E - SHA384 - SHA3_256E - SHA3_256 - SHA3_512E - SHA3_512 - SHA3_224E - SHA3_224 - SHA3_384E - SHA3_384 - SKEIN256E - SKEIN256 - SKEIN512E - SKEIN512 - BLAKE2B256E - BLAKE2B256 - BLAKE2B512E - BLAKE2B512 - BLAKE2B160E - BLAKE2B160 - BLAKE2B224E - BLAKE2B224 - BLAKE2B384E - BLAKE2B384 - BLAKE2BP512E - BLAKE2BP512 - BLAKE2S256E - BLAKE2S256 - BLAKE2S160E - BLAKE2S160 - BLAKE2S224E - BLAKE2S224 - BLAKE2SP256E - BLAKE2SP256 - BLAKE2SP224E - BLAKE2SP224 - SHA1E - SHA1 - MD5E - MD5 - WORM - URL - X* - local repository version: 8 - operating system: linux x86_64 - remote types: - git - gcrypt - p2p - S3 - bup - directory - rsync - web - bittorrent - webdav - adb - tahoe - glacier - ddar - git-lfs - httpalso - borg - hook - external - supported repository versions: - 8 - upgrade supported from repository versions: - 0 - 1 - 2 - 3 - 4 - 5 - 6 - 7 - version: 8.20211118-g23ee48898 ## location - path: /data/lell/test/datalad-chain/b - type: dataset ## metadata_extractors - annex (datalad 0.15.6): - distribution: datalad 0.15.6 - load_error: None - module: datalad.metadata.extractors.annex - version: None - audio (datalad 0.15.6): - distribution: datalad 0.15.6 - load_error: ModuleNotFoundError(No module named 'mutagen') - module: datalad.metadata.extractors.audio - datacite (datalad 0.15.6): - distribution: datalad 0.15.6 - load_error: None - module: datalad.metadata.extractors.datacite - version: None - datalad_core (datalad 0.15.6): - distribution: datalad 0.15.6 - load_error: None - module: datalad.metadata.extractors.datalad_core - version: None - datalad_rfc822 (datalad 0.15.6): - distribution: datalad 0.15.6 - load_error: None - module: datalad.metadata.extractors.datalad_rfc822 - version: None - exif (datalad 0.15.6): - distribution: datalad 0.15.6 - load_error: ModuleNotFoundError(No module named 'exifread') - module: datalad.metadata.extractors.exif - frictionless_datapackage (datalad 0.15.6): - distribution: datalad 0.15.6 - load_error: None - module: datalad.metadata.extractors.frictionless_datapackage - version: None - image (datalad 0.15.6): - distribution: datalad 0.15.6 - load_error: ModuleNotFoundError(No module named 'PIL') - module: datalad.metadata.extractors.image - xmp (datalad 0.15.6): - distribution: datalad 0.15.6 - load_error: ModuleNotFoundError(No module named 'libxmp') - module: datalad.metadata.extractors.xmp ## metadata_indexers ## python - implementation: CPython - version: 3.6.8 ## system - distribution: CentOS Linux/7.9.2009/Core - encoding: - default: utf-8 - filesystem: utf-8 - locale.prefered: UTF-8 - max_path_length: 300 - name: Linux - release: 5.4.225-200.el7.x86_64 - type: posix - version: datalad/datalad#1 SMP Mon Jan 2 15:36:14 UTC 2023 ```

Additional context

No response

Have you had any success using DataLad before?

No response

adswa commented 1 year ago

Thanks for the report! I gave your reproducer (thanks much!) a quick try. I can't seem to reproduce it with a more recent datalad.

(handbook) adina@muninn in /tmp
❱ datalad create a
create(ok): /tmp/a (dataset)
(handbook) adina@muninn in /tmp
❱ cd a

(handbook) adina@muninn in /tmp/a on git:master
❱ datalad create-sibling-ria -s origin ria+file://$PWD/../ria      
create-sibling-ria(error): /tmp/a (dataset) [No store found at '/tmp/a/../ria'. Forgot --new-store-ok ?]
(handbook) adina@muninn in /tmp/a on git:master
❱ datalad create-sibling-ria -s origin --new-store-ok ria+file://$PWD/../ria      
[INFO   ] create siblings 'origin' and 'origin-storage' ... 
[INFO   ] Fetching updates for Dataset(/tmp/a) 
update(ok): . (dataset)
update(ok): . (dataset)
[INFO   ] Configure additional publication dependency on "origin-storage" 
configure-sibling(ok): . (sibling)
create-sibling-ria(ok): /tmp/a (dataset)
action summary:  
  configure-sibling (ok: 1)
  create-sibling-ria (ok: 1)
  update (ok: 1)
0.00 [00:01, ?/s]                                                               (handbook) adina@muninn in /tmp/a on git:master
❱ cd ..
(handbook) adina@muninn in /tmp
❱ datalad clone a b
[INFO   ] Fetching updates for Dataset(/tmp/b)                                  
update(ok): . (dataset)
update(ok): . (dataset)
configure-sibling(ok): . (sibling)
install(ok): /tmp/b (dataset)
action summary:
  configure-sibling (ok: 1)
  install (ok: 1)
  update (ok: 1)
(handbook) adina@muninn in /tmp
❱ cd b
(handbook) adina@muninn in /tmp/b on git:master
❱ git remote -v
origin  ../a (fetch)
origin  ../a (push)
origin-2    /tmp/a/../ria/e21/e1696-4462-4c84-be22-eac41fbc6279 (fetch)
origin-2    /tmp/a/../ria/e21/e1696-4462-4c84-be22-eac41fbc6279 (push)
origin-storage  
(handbook) adina@muninn in /tmp/b on git:master
❱ datalad wtf -S datalad -S git-annex
# WTF
## datalad 
  - version: 0.18.2+16.gaa7170e0a
## git-annex 
  - build flags: 
    - Assistant
    - Webapp
    - Pairing
    - Inotify
    - DBus
    - DesktopNotify
    - TorrentParser
    - MagicMime
    - Benchmark
    - Feeds
    - Testsuite
    - S3
    - WebDAV
  - dependency versions: 
    - aws-0.22.1
    - bloomfilter-2.0.1.0
    - cryptonite-0.29
    - DAV-1.3.4
    - feed-1.3.2.1
    - ghc-9.0.2
    - http-client-0.7.13.1
    - persistent-sqlite-2.13.1.0
    - torrent-10000.1.1
    - uuid-1.3.15
    - yesod-1.6.2.1
  - key/value backends: 
    - SHA256E
    - SHA256
    - SHA512E
    - SHA512
    - SHA224E
    - SHA224
    - SHA384E
    - SHA384
    - SHA3_256E
    - SHA3_256
    - SHA3_512E
    - SHA3_512
    - SHA3_224E
    - SHA3_224
    - SHA3_384E
    - SHA3_384
    - SKEIN256E
    - SKEIN256
    - SKEIN512E
    - SKEIN512
    - BLAKE2B256E
    - BLAKE2B256
    - BLAKE2B512E
    - BLAKE2B512
    - BLAKE2B160E
    - BLAKE2B160
    - BLAKE2B224E
    - BLAKE2B224
    - BLAKE2B384E
    - BLAKE2B384
    - BLAKE2BP512E
    - BLAKE2BP512
    - BLAKE2S256E
    - BLAKE2S256
    - BLAKE2S160E
    - BLAKE2S160
    - BLAKE2S224E
    - BLAKE2S224
    - BLAKE2SP256E
    - BLAKE2SP256
    - BLAKE2SP224E
    - BLAKE2SP224
    - SHA1E
    - SHA1
    - MD5E
    - MD5
    - WORM
    - URL
    - X*
  - local repository version: 10
  - operating system: linux x86_64
  - remote types: 
    - git
    - gcrypt
    - p2p
    - S3
    - bup
    - directory
    - rsync
    - web
    - bittorrent
    - webdav
    - adb
    - tahoe
    - glacier
    - ddar
    - git-lfs
    - httpalso
    - borg
    - hook
    - external
  - supported repository versions: 
    - 8
    - 9
    - 10
  - upgrade supported from repository versions: 
    - 0
    - 1
    - 2
    - 3
    - 4
    - 5
    - 6
    - 7
    - 8
    - 9
    - 10
  - version: 10.20221003

Could you retry after updating the tools?

mlell commented 1 year ago

Thank you for this hint! I just installed datalad but missed the fact that the datalad version was restricted by our server's ancient python version. I have upgraded to Python 3.7.3 (the latest that I could compile myself easily on our cluster) and datalad 0.18.2

Indeed the problem is largely gone and the "origin" of repo A is renamed:

$ datalad siblings
.: here(+) [git]
.: origin-2(-) [(omitted)/../ria/5d0/493e1-44eb-42eb-8777-29b0ea6e5b43 (git)]
.: origin-storage(+) [ora]
.: origin(+) [../a (git)]

There remain only two things:

$ git -C a config --list | grep ^remote
remote.origin-storage.annex-externaltype=ora
remote.origin-storage.annex-uuid=43dcfcde-2c1a-4621-99ff-6a5464297255
remote.origin-storage.skipfetchall=true                            # <<< this is not inherited on cloning ===========
remote.origin-storage.annex-cost=100.0
remote.origin-storage.annex-availability=GloballyAvailable
remote.origin.annex-ignore=true                                    # <<< this is not inherited on cloning ===========
remote.origin.url=/data/lell/test/datalad-chain/a/../ria/5d0/493e1-
44eb-42eb-8777-29b0ea6e5b43
remote.origin.fetch=+refs/heads/*:refs/remotes/origin/*
remote.origin.datalad-publish-depends=origin-storage               # <<< this is not inherited on cloning ===========

$ git -C b config --list | grep ^remote
remote.origin.url=../a
remote.origin.fetch=+refs/heads/*:refs/remotes/origin/*
remote.origin.annex-uuid=b1a84354-0031-4c73-a608-ef9997a93234
remote.origin-storage.annex-externaltype=ora
remote.origin-storage.annex-uuid=43dcfcde-2c1a-4621-99ff-6a5464297255
remote.origin-storage.annex-cost=100.0
remote.origin-storage.annex-availability=GloballyAvailable
remote.origin-2.url=/data/lell/test/datalad-chain/a/../ria/5d0/493e1-44eb-42eb-8777-29b0ea6e5b43
remote.origin-2.fetch=+refs/heads/*:refs/remotes/origin-2/*
remote.origin-2.annex-ignore=false
mlell commented 1 year ago

Test for publication dependency problem:

$ cd b
$ echo "test" > x
$ datalad save
$ datalad push --to origin-2
publish(ok): . (dataset) [refs/heads/master->origin-2:refs/heads/master [new branch]]
publish(ok): . (dataset) [refs/heads/git-annex->origin-2:refs/heads/git-annex [new branch]]
action summary:
  copy (notneeded: 1)
  publish (ok: 2)
$ git annex find --not --in origin-storage

  remote origin-2:This repository is not initialized for use by git-annex, but /qg-10/data/AGR-QG/lell/test/datalad-chain/a/../ria/5d0/493e1-44eb-42eb-8777-29b0ea6e5b43/annex/objects/ exists, which indicates this repository was used by git-annex before, and may have lost its annex.uuid and annex.version configs. Either set back missing configs, or run git-annex init to initialize with a new uuid.
x

(relevant of the last command is the x at the very end, indicating that the file x was not uploaded to the RIA store annex. The warning before that might come from the config remote.origin-2.annex-ignore not being inherited from the a repo as well)

Comparing with the case where we first push x from b to "origin" (->a) and then from a to "origin"(->ria), the ORA sibling of the RIA store is updated

$cd a
$git config receive.denyCurrentBranch updateInstead
$cd ../b
$datalad push --to origin
$cd ../a
$datalad push --to origin
$git annex find --not --in origin-storage
# -- no output, so x is uploaded --