datalad / datalad-ria

Adds functionality for RIA stores to DataLad
http://datalad.org
Other
0 stars 1 forks source link

cloning from remote RIA sibling requires password for remote user on source dataset local host #20

Open jmschabdach opened 3 years ago

jmschabdach commented 3 years ago

What is the problem?

A remote RIA sibling asks for the remote user's password on the orginating host machine when trying to clone on remote.

What steps will reproduce the problem?

I set up my ~/.ssh/config to contain user and domain info for the remote per the suggestions in #5475.

Locally, in the directory containing the dataset:

> datalad create-sibling-ria -s cubic-ria --shared all ria+ssh://cubic-login:/cbica/projects/bgdimagecentral/transfer
> datalad push --to cubic-ria

Remote:

> cd /cbica/projects/bgdimagecentral/transfer
> mkdir alias
> cd alias
> ln -s /cbica/projects/bgdimagecentral/transfer/80e/... myclone
> # Clone the dataset from ria to regular
> cd ../..
> mkdir ria_clone_test
> cd ria_clone_test
> datalad clone ria+file:///cbica/projects/bgdimagecentral/transfer#~myclone .
[INFO   ] Scanning for unlocked files (this may take some time)   
schabdaj@cubic-login's password: 

schabdaj is the remote user, cubic-login is my local nickname for the remote domain.

What version of DataLad are you using (run datalad --version)? On what operating system (consider running datalad wtf)?

Remote: datalad 0.14.3 Linux centos/7/Core

Local: datalad 0.14.3 Mac Darwin 19.6.0

Is there anything else that would be useful to know in this context?

The way users work on remote is a little odd, but I double checked that everyone within the specific user group has rwx access to the relevent files/directories.

Have you had any success using DataLad before? (to assess your expertise/prior luck. We would welcome your testimonial additions to https://github.com/datalad/datalad/wiki/Testimonials as well)

Set up the RIA sibling successfully, familiar with git

bpoldrack commented 3 years ago

Hey @jmschabdach,

sorry for a late reponse! So, this is a weird looking effect - I agree. However, just to be sure I'm on the right track: What happens, if you just enter wrong password (just empty?) into the password request(s)? I'd assume that datalad proceeds and ultimately succeeds to getting access locally instead of via SSH. Is that right?

The problem - I assume - is as follows then: You created the storage sibling (special remote) to store the SSH URL as its config. Now, on cloning datalad/git-annex tries to check whether it can enable that special remote and if it fails, datalad will try to reconfigure it, based on the ria-URL you cloned from (rather than URL it was created with). Problem is: It will try the SSH version first and needs to fail with that in order to reconfigure its access method to meet the clone URL.

In the datalad master branch (not yet released) is an enhancement that kind of addresses the issue (in the sense that it supports that kind of workflow explicitly, where read and write access methods differ): You can then distinguish between a url and a push-url when setting up a RIA sibling. So, you could use for example HTTP (or file in your case) for the URL and SSH only for push-url, thus circumventing the need for reconfiguration.

However, this isn't really solving the issue, just making it less likely to run into it. A "real" solution to that isn't instantly obvious, though. Need to think about that. But would still be nice to have confirmed, that it works for you to let the SSH authentication fail. Just to be sure, there isn't another issue hiding in there.

jmschabdach commented 3 years ago

Hi @bpoldrack

Thanks for following up.

Actually, when I enter the wrong password or leave the password blank, I get a Permission denied, please try again message.

When I enter the correct password, the sibling is configured and installed (at least according to the messages). Running datalad get in the directory allows me to access the files.

It works, but I'm still confused about what's going on under the hood. If I'm cloning ria+file from remote to another directory on remote, why would it ask for the password using the format from my local machine? My concern is for when other users clone from remote. If Alice needs to clone the data set, will she need to enter a password for alice@cubic-login (my local nickname for the remote)? Is there potential for this to limit the transferability of the data?

It's a lot of speculation, but I'm curious what your thoughts are.

Thanks, Jenna

mih commented 3 years ago

Just to say that this is not forgotten. It has something to do with the reconfiguration of a dataset based on its source location (when, and when NOT to do it). This is a rather complex issue with many facets (e.g. https://github.com/datalad/datalad/issues/5628). We are currently settling on an improved behavior that should also cover your case.