broadinstitute / lincs-cell-painting

Processed Cell Painting Data for the LINCS Drug Repurposing Project
BSD 3-Clause "New" or "Revised" License
25 stars 13 forks source link

Download profiles via dvc command line #71

Closed benoalo closed 3 years ago

benoalo commented 3 years ago

Hi @gwaygenomics and team,

We've worked on some cell profiling tools and would be interested to try them on this dataset. Unfortunately I am having trouble to download the profile data. It would be great if you would have pointers to help with that?

At the moment I tried the dvc get command line tools (I am new to it but quite excited by the concept 😊) but I probably do something wrong (I tried this on windows10 from a powershell terminal)

Capture

Thanks for your help,

Kind regards, Benoit

gwaybio commented 3 years ago

Hi @benoalo - thanks for your interest in this repo!

I've tried that command on my local machine and it works just fine. I did a bit of sleuthing, and it looks like the issue might have to do with your AWS credentials. Maybe this is helpful: https://github.com/iterative/dvc/issues/3050#issuecomment-674340656

The AWS bucket is publicly available, so you should be able to download no problem. If you have any additional followup questions, or if you find the solution, please do let us know!

benoalo commented 3 years ago

Hi @gwaygenomics ,

Thank you for the pointer, my default AWS credential profile was the culprit and changing it solved the issue.

Initially I didn't realized the link with AWS. Checking the remote location was useful to troubleshoot (i.e. with dvc remote list command in the git repo). Then with aws cli it was easy to find which credentials worked fine :-) .

gwaybio commented 3 years ago

Great! Thanks for following up confirming the solution. Let us know if you have any other questions.

FloHu commented 3 years ago

Hi @gwaygenomics , I have been less successful it seems. I get the following error:

image

I thought it is because the add-jump-cp-role only allows access to the jump-cellpainting prefix. I have been able to download data from s3://jump-cellpainting But @shntnu said it should be publicly accessible.

@benoalo may I ask what aws cli tools you used to track down the issue?

gwaybio commented 3 years ago

Hi @FloHu - I believe you are looking for aws configure. Once you provide credentials, dvc pull should just work.

FloHu commented 3 years ago

OK it works now: the problem was that I assumed the jumpcprole (see also here) by default, which I had set up in order to access consortium data. Apparently though this role is less privileged than any normal user when it comes to accessing any s3 bucket outside of s3://jump-cellpainting. Running aws-sso to get a new access token for any user and then using this user profile solved the issue.

gwaybio commented 3 years ago

awesome, glad this is resolved. And thanks for posting here, I think others might run into this same issue, so thanks for solving for all!