SIMEXP / Repo2Data

Automatic data fetcher from the web
MIT License
7 stars 3 forks source link

OSF fetch directives for subset data #17

Closed agahkarakuzu closed 2 years ago

agahkarakuzu commented 2 years ago

Right now, when osf link is detected, repo2data attempts osf clone. This works when cloning a whole project, but fails when an osf link associated with individual data in a project.

Failing config:

{ "src": "https://osf.io/kjgcs",
  "dst": "./data",
  "projectName": "repo2data_osf",
  "recursive": true}

The osf command that works for what user requests:

osf -p 6zbyf fetch data/20160918_sct_example_data.zip ./data/20160918_sct_example_data.zip

For subsets of a project, data_requirement.json can be something like:

{ "src": "https://osf.io/6zbyf ",
   "remotePath": "data/20160918_sct_example_data.zip",
  "dst": "./data",
  "projectName": "repo2data_osf",
  "recursive": true}

Something like this, or something you see fit. We should document that in case of osf, the src MUST be the link containing the project id. If a subset needed, remotePath should be provided.

ltetrel commented 2 years ago

Indeed what I am doing is using osf clone: https://github.com/SIMEXP/Repo2Data/blob/03826c8ebd7612606d508cadb6079117f8092764/repo2data/repo2data.py#L203-L212

Should not remotePath be a list then ?

agahkarakuzu commented 2 years ago

Should not remotePath be a list then ?

It could be a list to recurse over. I think if list, iterate, if not single run would be a user friendly approach as people may not be a fan of array literals for a single entry:

{
"remotePath" = ["something/something"]
}
ltetrel commented 2 years ago

and just to make sure, "remotePath": "data/20160918_sct_example_data.zip" the data folder here is inside osf right ?

agahkarakuzu commented 2 years ago

Indeed:

image

ltetrel commented 2 years ago

Also final thing, @agahkarakuzu can you provide me a repo that wants to use this functionality ? Ideally through binder so that I can test also on NeuroLibre in the same time.

jvelazquez-reyes commented 2 years ago

@ltetrel this is the repo I'm trying to use repo2data in: https://github.com/jvelazquez-reyes/sct-book

agahkarakuzu commented 2 years ago

@ltetrel the first comment has two links, those make an example btw.

ltetrel commented 2 years ago

@jvelazquez-reyes @agahkarakuzu https://github.com/SIMEXP/Repo2Data#osf

I will ping you when this release will be available on Neurolibre, still need to update things on the cluster.