datalad / datalad-osf

DataLad extension to interface with the Open Science Framework
Other
14 stars 12 forks source link

Reconnecting osf project (sibling) with github repo #192

Closed hvgazula closed 10 months ago

hvgazula commented 10 months ago

I created a Github repo and configured its storage sibling (on OSF) in the past. Now, I cloned the Github repo on a different machine, and running datalad siblings only shows here and origin with no sign of the storage sibling created earlier. How do I configure this sibling in the newly cloned repo?

hvgazula commented 10 months ago

I may have figured out a way to find the name of the storage sibling using git annex enableremote. Now, I am wondering if there's a way to find the URL of that remote. Any help would be much appreciated. Thanks.

adswa commented 10 months ago

Hi, thanks for the issue! Could you tell me which mode you used for the OSF-sibling? I suppose you followed the tutorial for "Use case 3" in the docs? If yes, I'm a bit puzzled which the storage sibling was not autoenabled - typically, it should automatically configure itself, and with that also show up as a sibling. A first idea that came into my mind is that maybe the cloning from Github was done in an environment where datalad-osf was not installed? In that case, it couldn't to the necessary configurations, but you should see a message about a missing git-annex-remote-osf during clone.

hvgazula commented 10 months ago

That's correct. I followed the use case 3. And, clearly datalad-osf is installed in my environment. In the attached snapshot, you will see the difference in the output when I run siblings (with both git clone and datalad clone). Nevertheless, assuming datalad clone is the correct way to go about cloning, how do I know the URL of the remote-storage?

image
hvgazula commented 10 months ago

PS: I used the default annex mode.

adswa commented 10 months ago

Actually, the outcome of datalad clone in your screenshot looks like it yielded the desired result. The OSF sibling test-aws-storage is enabled, and should be able to obtain you file content. Just to make sure I don't misunderstand - is there anything in test-aws-dl that doesn't work?

If you run git annex info test-aws-storage or git annex whereis <path-to-file-with-content-on-osf> you should see the special remote name in square brackets to indicate that its enabled.

E.g., something is wrong here:

(handbook-upgrade) adina@muninn in /tmp/gh-test on git:master
❱ git annex whereis file
whereis file (2 copies) 
    b9798061-9f74-4100-b838-1b54c4b3a73b -- adina@muninn:/tmp/osf-test
    c46cb1dd-08b5-4964-ae29-a9c41872f6a2 -- osf-storage
ok
(handbook-upgrade) adina@muninn in /tmp/gh-test on git:master
❱ git annex info osf-storage
uuid: c46cb1dd-08b5-4964-ae29-a9c41872f6a2
description: osf-storage
trust: semitrusted
remote annex keys: 1
remote annex size: 6 bytes

But things are fine here:

❱ git annex whereis file
whereis file (2 copies) 
    b9798061-9f74-4100-b838-1b54c4b3a73b -- adina@muninn:/tmp/osf-test
    c46cb1dd-08b5-4964-ae29-a9c41872f6a2 -- [osf-storage]
ok
(handbook-upgrade) adina@muninn in /tmp/gh-test on git:master
❱ git annex info osf-storage
uuid: c46cb1dd-08b5-4964-ae29-a9c41872f6a2
description: [osf-storage]
trust: semitrusted
remote: osf-storage
cost: 200.0
type: external
externaltype: osf
encryption: none
chunking: none
remote annex keys: 1
remote annex size: 6 bytes

how do I know the URL of the remote-storage?

Unlike many other special remotes, the way that we identify the remote storage on OSF is via its "node" (the OSF ID). This is recorded in the git-annex branch of the dataset in the file remote.log. You can query this file without switching branches using git cat-file -p git-annex:remote.log:

(handbook-upgrade) adina@muninn in /tmp/gh-test on git:master
❱ git cat-file -p git-annex:remote.log
c46cb1dd-08b5-4964-ae29-a9c41872f6a2 autoenable=true encryption=none externaltype=osf name=osf-storage node=n6jq2 type=external timestamp=1707230604.980686718s

So in this example, the OSF URL this corresponds to would be https://osf.io/n6jq2/. But you wouldn't need to provide this URL to enableremote, if the osf remote helper is installed (even if it gets installed only after cloning), git annex enableremote should have all necessary info on its own in the dataset.

hvgazula commented 10 months ago

Awesome. Everything works just fine. All I wanted to know was the storage URL. Admittedly, I did not pay much attention to the remote.log file in the git-annex branch. Thank you very much for the detailed explanation.

hvgazula commented 10 months ago

If yes, I'm a bit puzzled which the storage sibling was not autoenabled - typically, it should automatically configure itself, and with that also show up as a sibling.

One last thing before I close this out. Clearly datalad clone seemed to have done the job (showing all siblings) unlike git clone. Is that an expected behavior with git clone? I am happy to test this again if not.

adswa commented 10 months ago

One last thing before I close this out. Clearly datalad clone seemed to have done the job (showing all siblings) unlike git clone. Is that an expected behavior with git clone? I am happy to test this again if not.

Yes, for special remotes, Git alone doesn't suffice, mostly so because the concept of special remotes is git-annex specific, not because its the OSF. datalad clone (or a later manual git-annex command) needs to put the necessary configuration on top.

hvgazula commented 10 months ago

@adswa Just wanted to let you know that running datalad siblings twice on a repo cloned using git clone shows the siblings correctly the second time. 🤷‍♂️

image