Closed adswa closed 2 years ago
Merging #112 (a7321c8) into master (e59722a) will decrease coverage by
0.00%
. The diff coverage is83.33%
.
@@ Coverage Diff @@
## master #112 +/- ##
==========================================
- Coverage 81.79% 81.78% -0.01%
==========================================
Files 59 59
Lines 4768 4771 +3
==========================================
+ Hits 3900 3902 +2
- Misses 868 869 +1
Impacted Files | Coverage Δ | |
---|---|---|
datalad_crawler/nodes/annex.py | 80.99% <83.33%> (-0.07%) |
:arrow_down: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update e59722a...a7321c8. Read the comment docs.
@yarikoptic I do need your help with one other thing. I think it is related to the Annexificator
. After
https://github.com/datalad/datalad/pull/6105/commits/ab852c43ea591618191ce15fe7b8906bcbe65801 and https://github.com/datalad/datalad/pull/6105/commits/ee83851fb7be69f36e16dcec8ed0a69583604c8f (a refactoring to use ensure_datalad_remote
(using repo.get_special_remotes()
internally) to check for preexisting datalad-archives
remotes), there is one crawler test that fails:
======================================================================
ERROR: datalad_crawler.nodes.tests.test_annex.test_add_archive_content_tar
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/adina/env/handbook2/lib/python3.9/site-packages/nose/case.py", line 198, in runTest
self.test(*self.arg)
File "/home/adina/repos/datalad/datalad/tests/utils.py", line 1163, in _wrap_assert_cwd_unchanged
raise exc_info[1]
File "/home/adina/repos/datalad/datalad/tests/utils.py", line 1135, in _wrap_assert_cwd_unchanged
ret = func(*args, **kwargs)
File "/home/adina/repos/datalad/datalad/tests/utils.py", line 558, in _wrap_with_tree
return t(*(arg + (d,)), **kw)
File "/home/adina/repos/datalad-crawler/datalad_crawler/nodes/tests/test_annex.py", line 227, in test_add_archive_content_tar
output_addarchive = list(
File "/home/adina/repos/datalad-crawler/datalad_crawler/nodes/annex.py", line 1275, in _add_archive_content
add_archive_content(
File "/home/adina/repos/datalad/datalad/interface/utils.py", line 484, in eval_func
return return_func(generator_func)(*args, **kwargs)
File "/home/adina/repos/datalad/datalad/interface/utils.py", line 476, in return_func
results = list(results)
File "/home/adina/repos/datalad/datalad/interface/utils.py", line 396, in generator_func
for r in _process_results(
File "/home/adina/repos/datalad/datalad/interface/utils.py", line 579, in _process_results
for res in results:
File "/home/adina/repos/datalad/datalad/interface/add_archive_content.py", line 401, in __call__
ensure_datalad_remote(ds.repo, remote=ARCHIVES_SPECIAL_REMOTE,
File "/home/adina/repos/datalad/datalad/customremotes/base.py", line 590, in ensure_datalad_remote
init_datalad_remote(repo, remote,
File "/home/adina/repos/datalad/datalad/customremotes/base.py", line 560, in init_datalad_remote
return repo.init_remote(remote, remote_opts + opts)
File "/home/adina/repos/datalad/datalad/support/annexrepo.py", line 1878, in init_remote
self.call_annex(['initremote'] + [name] + options)
File "/home/adina/repos/datalad/datalad/support/annexrepo.py", line 1170, in call_annex
return self._call_annex(
File "/home/adina/repos/datalad/datalad/support/annexrepo.py", line 924, in _call_annex
return runner.run(
File "/home/adina/repos/datalad/datalad/runner/runner.py", line 145, in run
raise CommandError(
datalad.runner.exception.CommandError: CommandError: 'git -c diff.ignoreSubmodules=none -c annex.alwayscommit=false annex initremote datalad-archives encryption=none type=external autoenable=true externaltype=datalad-archives uuid=c04eb54b-4b4e-5755-8436-866b043170fa -c annex.dotfiles=true' failed with exitcode 1 under /tmp/datalad_temp_tree_test_add_archive_content_tariz6chk26 [err: 'git-annex: There is already a special remote named "datalad-archives". (Use enableremote to enable an existing special remote.)']
----------------------------------------------------------------------
Ran 10 tests in 11.709s
i.e., the check for already existing special remotes failed to detect the one in the test repo. Digging into why this may be, I found something weird:
repo.get_special_remotes
returns a all known enabled and unenabled special remotes by querying the remote.log
of the git-annex
branch. In the created test repo, this fails. Here is the relevant debug output:
datalad.runner.runner: DEBUG : Finished ['git', '-c', 'diff.ignoreSubmodules=none', 'cat-file', 'blob', 'git-annex:remote.log'] with status 128
datalad.dataset.gitrepo: Level 11: CommandError: 'git -c diff.ignoreSubmodules=none cat-file blob git-annex:remote.log' failed with exitcode 128 under /tmp/datalad_temp_tree_test_add_archive_content_tarutj6f3bz [err: 'fatal: Not a valid object name git-annex:remote.log']
Further looking into the test repo, it appears that the repo is something funky. It appears to have a master branch:
adina@muninn in /tmp/datalad_temp_tree_test_add_archive_content_tarr7fknqkb on git:master+
❱ git st
On branch master
No commits yet
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: 1.tar
adina@muninn in /tmp/datalad_temp_tree_test_add_archive_content_tarr7fknqkb on git:master+
... but yet it doesn't?
adina@muninn in /tmp/datalad_temp_tree_test_add_archive_content_tarr7fknqkb on git:master+
❱ git branch
git-annex
❱ ls .git/refs/heads
git-annex
And the git-annex
branch isn't actually a git-annex branch:
adina@muninn in /tmp/datalad_temp_tree_test_add_archive_content_tarr7fknqkb on git:master+
❱ git co git-annex 1 !
A 1.tar
Switched to branch 'git-annex'
adina@muninn in /tmp/datalad_temp_tree_test_add_archive_content_tarr7fknqkb on git:git-annex+
❱ ls
1.tar
This is because the repository has not a single commit yet. The 1.tar archive is only staged, there is no initial commit, and no real git-annex
branch has been established, and thus not the relevant remote.log
.
Previously, add-archive-content relied on annex.get_remotes()
to check against pre-existing remotes (thereby apparently missing initialized but unenabled special remotes (datalad#1693), but succeeding in this special case of repo). get_remotes
queries .git/config
for remotes, instead of checking git-annex's remote.log
.
How can I ensure that the special remote is found, without reverting https://github.com/datalad/datalad/pull/6105/commits/ab852c43ea591618191ce15fe7b8906bcbe65801 and https://github.com/datalad/datalad/pull/6105/commits/ee83851fb7be69f36e16dcec8ed0a69583604c8f? I think I'm missing something about what this test setup does or is supposed to do.
With datalad/datalad#6135 merged, I think this one is ready to go
https://github.com/datalad/datalad/pull/6105 refactors add-archive-content to be a dataset method. This requires changes in datalad-crawlers use/adaptation of the function. For one, we need to pass a dataset instance. Secondly, I had to disable the intergrity check of 'annex', which used to be returned by add-archive-content, but isn't anymore.
Locally, this makes the changes in https://github.com/datalad/datalad/pull/6105 not break any crawler tests anymore. It would be great to have your opinion on this, @yarikoptic.