Open aaronkanzer opened 4 months ago
I think there are two separate, but related issues here (and solving 2. depends on solving 1. first):
dandi download --download dandiset.yaml <dandiset-url>
to boil down/implement desired convenience.
NB upon trying different URI schemas I found that there is a "workaround side-effect" if path is used as a glob (might not be generally applicable/desired), then we would get leading path too
❯ dandi download https://dandiarchive.org/dandisets/000027/versions/0.210831.2033/assets/\?glob\=sub-RAT123/sub-RAT123.nwb
PATH SIZE DONE DONE% CHECKSUM STATUS MESSAGE
sub-RAT123/sub-RAT123.nwb 18.8 kB 18.8 kB 100% ok done
Summary: 18.8 kB 18.8 kB 1 done
100.00%
❯ datalad clone https://github.com/dandisets/000027
[INFO ] Remote origin not usable by git-annex; setting annex-ignore
[INFO ] https://github.com/dandisets/000027/config download failed: Not Found
[INFO ] access to 2 dataset siblings dandi-dandisets-dropbox, dandiapi not auto-enabled, enable with:
| datalad siblings -d "/tmp/000027" enable -s SIBLING
install(ok): /tmp/000027 (dataset)
❯ cd 000027
❯ datalad get sub-RAT123/sub-RAT123.nwb
get(ok): sub-RAT123/sub-RAT123.nwb (file) [from web...]
❯ ls -lL sub-RAT123/sub-RAT123.nwb
-r--r--r-- 1 yoh yoh 18792 Jul 18 07:51 sub-RAT123/sub-RAT123.nwb
# now edit / dandi upload
For an "ultimate" solution, we need to add some basic zarr navigator, related
to make it easier for a user to get desired "full" URL to specific zarr component.
As for update of metadata only it would be quite tricky AFAIK to implement correctly but indeed editing metadata is a valid use case. ATM it is 'possible' only via full zarr download, and I believe we would avoid reuploading any file which was not modified (@jwodder might correct me if I am wrong). As for partial download and upload of zarr -- I think we would also need support for that in the client:
2.b case. Note that we also have datalad datasets for zarr "filesets" too! https://github.com/dandizarrs/ . Frankly I have not yet tried how dandi upload
would behave if we do try to datalad get
only some files and then upload such updated zarr folder while some data files would be missing.
Thanks team. Moving this issue to the DANDI Client repo, as it doesn't seem like we would need changes to the web app or REST API.
Cc @dstansby @kabilar @satra @yarikoptic @waxlamp @balbasty
In the LINC project, @dstansby encountered a scenario where an update was requested for a portion of a Zarr directory. Currently, DANDI and LINC treat a Zarr directory as a single object tree, requiring the entire directory to be downloaded even for updates that only modify specific pieces.
Downloading the entire Zarr directory can be inefficient, especially for large datasets where only a small portion needs updating.
This issue's purpose is to capture the need for mechanism to allow for partial updates of Zarr directories within Dandi and LINC.
Analagous, @satra suggested the initial usage of zarrita to explore elements of sharding, with perhaps the LINC project as a place to test