Open jennydaman opened 1 year ago
For my actual data, I tried deleting then recreating the sibling as shown above then pushing again. However it's saying the push is not needed even though the data is not present in the RIA.
datalad siblings
.: here(+) [git]
.: ria(-) [/neuro/labs/grantlab/research/Jennings/var/datalad_ria/51a/40309-d98a-40aa-9891-298d42215e7f (git)]
.: ria-storage(+) [ora]
du -hs /neuro/labs/grantlab/research/Jennings/var/datalad_ria
31M /neuro/labs/grantlab/research/Jennings/var/datalad_ria
du -hs .
13G .
datalad push --to ria -r
action summary:
copy (notneeded: 638)
publish (notneeded: 4)
Hey @jennydaman, thanks for the detailed issue! Most of what you describe are sadly known deficiencies of the current implementation, and we have plans to completely redo the ria functionality to iron them out. Other urgent projects had delayed this so far however, and I hope we can get to it in the coming weeks.
The first issue you describe - the ria-sibling lingering around in the background - reproduces. It is not unfixable, but we don't have ready-made datalad commands to do that work. Generally you would either do a reconfiguration (with the git-annex command git annex enableremote
), or use a different special remote name in your new ria store. Unlike Git remotes, git-annex special remotes are not easy to remove (a strong safe-guard to prevent two special remotes with the same name). Even when datalad siblings
or git remote -v
does not list them, you would find the presumably removed special remote when running git annex info
. If you'd really want to remove it, you could do it by declaring the ria-storage
special remote as dead
(git annex dead ria-storage
). This would hide it almost completely. But to be able to reuse the same special remote name from scratch you'd also need to forcefully purge it (git annex forget --drop-dead --force
) (see here). But as that link highlights, this isn't really recommended from git-annex side, and reconfiguration with git annex enableremote
would be the preferred way. I'll try to find some examples and documentation on this.
EDIT: This is a useful example for reconfiguring a special remote. And I also just found that there is a very recent new git-annex command that allows you to rename the old special remote, which would be much easier than what I outlined above: https://git-annex.branchable.com/git-annex-renameremote/
The TypeError
is bad, thanks much for the report, I will look into it.
The errors you mention also justify an apology - their on our list of known annoyances and we have plans to remove them, but didn't get to it yet. Sorry about them.
The failure to push is curious, because the create-sibling-ria
command would configure a publication dependency between the ria
remote that you are pushing to and the ria-storage
special remote. This is what it looks like in the dataset's .git/config
file (last line):
[remote "ria-storage"]
annex-externaltype = ora
annex-uuid = 3c97b679-851e-4f3c-8891-80ca56e9bb2b
skipFetchAll = true
annex-cost = 100.0
annex-availability = GloballyAvailable
[remote "ria"]
annex-ignore = true
url = /tmp/ria/c0a/51fd6-981c-4170-852f-7b69be4f1867
fetch = +refs/heads/*:refs/remotes/ria/*
datalad-publish-depends = ria-storage
Can you check whether this configuration exists for your sibling as well? My suspicion is that this configuration is missing, so it only pushes --to ria
what is in Git, and ignores all annexed contents.
If the configuration is in place, have to tried an explicit push --to ria-storage
?
The reason for the KeyError lies here: https://github.com/datalad/datalad/blob/40332b5ad25bf8744f7399f6c3575f7d28f71384/datalad/distributed/create_sibling_ria.py#L622-L653
When create-sibling-ria fails because the storage sibling already exists, it attempts a reconfiguration on its own. But then further down, it attempts to get the special remotes UUID from .git/config, which had been removed from there, and thus return the None that later sends the configuration command into a KeyError.
If the special remote sibling wouldn't be removed with datalad siblings remove -s ria-storage
, I believe your code would actually work.
@adswa thank you for the thorough explanation.
git annex
is still a mystery to me and I haven't felt this confused since learning git for the first time. I wish there was an easy way to cleanly reset what the datalad siblings and git annex remotes (but keep my data and datalad history). I've tried deleting then repushing the 10GB RIA location 3 times but it seems to just make things worse. Well, everything works, but I keep accumulating more and more orphaned "git annex things."
What is the problem?
I want to be able to completely reset the configuration of siblings for a dataset. However, the
datalad siblings remove
command does not cleanly remove all configurations for the dataset.What steps will reproduce the problem?
First, create an example dataset:
Next, I attempt to undo the effects of
datalad create-sibling-ria
:At this point, as expected
datalad siblings
reports that the only sibling ishere
:However, I am unable to recreate the sibling:
Moreover, a TypeError is encountered if you use
--existing reconfigure
:DataLad information
Additional context
The above is a minimal reproduction of the error I am encountering while trying to use
datalad
with my real data. I've encountered other problems related to removed siblings which are harder to reproduce. Some errors went away after rerunningdatalad create-sibling-ria
. Also, when I rundatalad clone ...
I get errors related to removed siblings:Here,
/neuro/labs/grantlab/research/Jennings/var/datalad_ria
is a valid Datalad RIA, whereas/neuro/labs/grantlab/research/Jennings/datalad_ria/innersp_fitting_data_analysis
was deletedHave you had any success using DataLad before?
No response