dandi / dandi-cli

DANDI command line client to facilitate common operations
https://dandi.readthedocs.io/
Apache License 2.0
21 stars 25 forks source link

download a partial but valid dandiset #1075

Open satra opened 2 years ago

satra commented 2 years ago

it would be nice if there was an option to download a partial but valid dandiset. with search results on the horizon, this may become more relevant, but the thought process came up with the bids datasets.

let's say i want to fix some part of a dataset and upload it. if i download just a file and stick a dandiset.yaml this will be insufficient for bids validation, but would likely work for nwb. however, there is no current mechanism to do this easily, and definitely no mechanism for a valid bids dandiset.

for non-bids, we would have to download just the metadata of the dandiset and then within it figure out how to download a file and stick in the right path. it would be nice if we could simply point to a file for download and say optionally to make it a valid dandiset. this would mean it would behave like downloading a dandiset with a filter that restricts it to a set of files.

for bids, we would need additional considerations, since something would have to determine the bids requirements for validation for a given filter.

either way, given the sizes of the datasets, it would be nice to be able to download and manipulate and upload partial dandisets without necessarily going the datalad route.

yarikoptic commented 2 years ago

Well, dandi download does support partial downloads. We have --download dandiset.yaml and we can point to any specific file or folder.

if we could simply point to a file for download and say optionally to make it a valid dandiset

For BIDS I don't think it is feasible to reliably encode all the needed logic to encode all the dependencies between files with all the inheritance principles etc. Even for NWBs it would need to be done to assess for all external referenced files and then possibly to download them or not.

without necessarily going the datalad route.

oh well -- DataLad should work quite nicely for this use case and if not - would be easy to make it work. I don't see another simple way to accomplish the drill here really beyond what we already have + users figuring out what to download for such partial views.

satra commented 2 years ago

users will not be able to figure this out or necessarily use datalad. i would put it out of reach for many of these folks. for most datasets they will be able to download all of it. i agree with your assessment, but that's the challenge i am calling out here. figuring out what is needed.

once we have the bids validator in, we will want people to use it, not bypass it. giacomo is not going to have all the data as an example to upload. and even if he can use datalad, others on that team in different institutions absolutely cannot, and they are not going to download all the data either to upload a few excel files.

so they need a pathway to create a sub dataset that they can validate and upload. just a note, for them even the python client was a challenge and sometimes they sent files to another site to upload. usability cannot be only for technically savvy people, which is where it is currently.

yarikoptic commented 2 years ago

I hear you, but also we need to be pragmatic. So far I do not see a reliable and generic way to implement what you want. If you, or someone else does - please propose as design document or implementation PR. Alternatively/complimentary we can have some zoom session to brainstorm this.