dandi / dandidav

WebDAV view to DANDI Archive
MIT License
4 stars 2 forks source link

Add text instructions on download/access #179

Open yarikoptic opened 1 month ago

yarikoptic commented 1 month ago

prompted by @jwodder in https://github.com/dandi/dandi-archive/issues/1993#issuecomment-2273593108 it would be a nice UX , similarly to how we have on https://datasets.datalad.org/ informing user about datalad install instructions, here we could provide wget invocation to download entire zarr, or otherwise specific dandiset or its folder. We also have already

which similarly suggests integration with external services to instruct users on how to interact with particular files or zarrs.

jwodder commented 1 month ago

@yarikoptic Problem: wget's "recursive" mode is limited to a maximum depth of 5 directories by default. Possible ways to address this are:

jwodder commented 1 month ago

@yarikoptic Further problems:

At the moment, my best wget command is:

wget \
    --recursive \
    --span-hosts \
    --domains=webdav.dandiarchive.org,api.dandiarchive.org \
    --no-parent \
    --content-disposition \
    --reject "index.html*" \
    https://webdav.dandiarchive.org/dandisets/000027/releases/0.210831.2033/

which downloads:

./
├── api.dandiarchive.org/
│   └── api/
│       └── dandisets/
│           └── 000027/
│               └── versions/
│                   └── 0.210831.2033/
│                       └── assets/
│                           └── 1c095f5f-d1e2-45db-b807-fdcfea08c6de/
├── dandiarchive.s3.amazonaws.com/
│   └── blobs/
│       └── 2db/
│           └── af0/
│               └── sub-RAT123.nwb
└── webdav.dandiarchive.org/
    └── dandisets/
        └── 000027/
            └── releases/
                └── 0.210831.2033/
                    ├── dandiset.yaml
                    └── sub-RAT123/
yarikoptic commented 1 month ago

@yarikoptic Problem: wget's "recursive" mode is limited to a maximum depth of 5 directories by default.

I had no idea! I think we are doomed to add/use --level=inf since we never really cared about recording/reflecting anywhere the depth of the zarr* . Indeed --no-parent would be mandatory and thus better be "near" in the line. We could also add --quota with e.g. 101% of zarr size but not sure if good idea and either adds any level of protection really.

* in a hindside might have suggested to be included in checksum but likely would be "too much" . Do you think it would be useful to discuss this aspect?

Actually -- we are in control of manifest generation, we can extract/include that info in the manifest!

jwodder commented 1 month ago

@yarikoptic

we are in control of manifest generation, we can extract/include that info in the manifest!

I got the impression you wanted this for Dandisets and folders within them as well, not just Zarrs.

yarikoptic commented 1 month ago

@yarikoptic

we are in control of manifest generation, we can extract/include that info in the manifest!

I got the impression you wanted this for Dandisets and folders within them as well, not just Zarrs.

right, I wanted indeed... for those we are indeed doomed to just hope for the --no-parent to work out and wget not crawling away from the original hierarchy.

jwodder commented 1 month ago

@yarikoptic I did manage to figure out an rclone command to download a folder nicely:

rclone copy \
    --webdav-url https://webdav.dandiarchive.org \
    :webdav:dandisets/000027/releases/0.210831.2033/ \
    0.210831.2033/

Should we use this instead of wget? Are there any other download commands we should list or consider listing in addition or instead?

jwodder commented 1 month ago

@yarikoptic Ping.

yarikoptic commented 3 weeks ago

Depending on how we present it -- we might want may be both? e.g. if it could be multiple tabs (wget, rclone, dandi cli , and may be even python etc) -- then people could choose what they have/like etc. I didn't look if there is a simple HTML/CSS/JS way though to make that happen. WDYT?

jwodder commented 3 weeks ago

@yarikoptic Worrying about how the data is presented is getting ahead of ourselves and ultimately not that important. I'm currently interested in what data should be presented.

yarikoptic commented 3 weeks ago

Then let's present both -- ugly wget and neater webdav aware rclone.