dandi / dandi-cli

DANDI command line client to facilitate common operations
https://dandi.readthedocs.io/
Apache License 2.0
21 stars 25 forks source link

download: add --keep-leading-path|-k or some mode for `--download` #1460

Closed yarikoptic closed 1 month ago

yarikoptic commented 1 month ago

Inspired by

ATM whenever we download an asset, we download file into current directory (besides if path is a glob as I showed in https://github.com/dandi/dandi-cli/issues/1474). That makes it inconvenient to replicate partial file tree hierarchy as present on the archive. I think we just need an option which would maximally simplify for a user a use case where he/she needs to edit/reupload some existing files. So for a URI which points to a dandiset_id and an asset_path(s), if such option is provided

I wonder though if should be a separate option or some additional "mode switch" for

  --download [dandiset.yaml,assets,all]
                                  Comma-separated list of elements to download
                                  [default: all]

but failing to come up with a descriptive enough value (dandiset+asset?)

jwodder commented 1 month ago

@yarikoptic Why should this option cause dandiset.yaml to be downloaded as well? Downloading an individual asset currently doesn't do that.

jwodder commented 1 month ago

@yarikoptic Also, why should the download path start with {dandiset_id}/? Downloads of other asset URL types (globs & folders) don't have that.

jwodder commented 1 month ago

@yarikoptic Also, should this option have a similar effect on downloads of multi-asset URLs? For example, currently, downloading just a foo/bar/baz/ directory will save everything under baz/, where this option could be used to instead save things under foo/bar/baz/.

yarikoptic commented 1 month ago

@yarikoptic Why should this option cause dandiset.yaml to be downloaded as well? Downloading an individual asset currently doesn't do that.

why: convenience. That is why I wondered if it should be a new mode dandiset.yaml+asset or alike. So it is not really for lower level code, just top level convenience.

@yarikoptic Also, why should the download path start with {dandiset_id}/? Downloads of other asset URL types (globs & folders) don't have that.

again: convenience, abd because dandiset.yaml+asset -- we do create {dandiset_id}/ when --download dandiset.yaml.

@yarikoptic Also, should this option have a similar effect on downloads of multi-asset URLs?

yes

For example, currently, downloading just a foo/bar/baz/ directory will save everything under baz/, where this option could be used to instead save things under foo/bar/baz/.

correct, we want to retain all leading paths. In case of multiple URLs, possibly multiple different folders, or even {dandiset_id}/ leading directories if we are not under a dandiset already.

kabilar commented 1 month ago

Thank you @yarikoptic and @jwodder. All of the options you describe above will be very helpful. I have needed this and several users have requested this feature.

jwodder commented 1 month ago

@yarikoptic Rather than adding a dandiset.yaml+asset mode, wouldn't it make more sense to change the all mode to have this new behavior and reserve the old behavior for just asset mode? I realize this is a breaking change, but semver lets us do that for 0.x.0 releases.

yarikoptic commented 1 month ago

wouldn't it make more sense to change the all mode to have this new behavior

I think current behavior is ok for all, as if you specify just a URL to a file it is IMHO more reasonable to just store that file without leading paths etc, and overall it does not impose much of checks/assumptions. The dandiset.yaml+asset mode would be non-default convenience to facilitate uses cases to modify (download/reupload) the file(s).

jwodder commented 1 month ago

@yarikoptic New suggestion: We implement this as a new --download mode, but call it "structure" or similar (as it preserves the directory structure and, by downloading dandiset.yaml, the Dandiset structure) instead of the confusing & unwieldy "dandiset.yaml+asset."

kabilar commented 1 month ago

@yarikoptic New suggestion: We implement this as a new --download mode, but call it "structure" or similar (as it preserves the directory structure and, by downloading dandiset.yaml, the Dandiset structure) instead of the confusing & unwieldy "dandiset.yaml+asset."

Thanks @jwodder @yarikoptic. I like this idea, and think that the --download mode of structure or tree would be intuitive.

Based on the discussion in https://github.com/dandi/dandi-cli/issues/1464, we may also need to include the dataset_description.json file to handle the above use case of download/edit/reupload.

kabilar commented 1 month ago

cc @aaronkanzer @dstansby @balbasty

The team is working to replicate the partial file tree hierarchy upon download to handle the use case of editing existing files.

yarikoptic commented 1 month ago

@yarikoptic New suggestion: We implement this as a new --download mode, but call it "structure" or similar (as it preserves the directory structure and, by downloading dandiset.yaml, the Dandiset structure) instead of the confusing & unwieldy "dandiset.yaml+asset."

I am ok with an alternative name. Nothing better than "structure" comes to mind ATM -- would be easy to change during PR review if some better choice strikes someone.

yarikoptic commented 1 month ago

Based on the discussion in dandi/dandi-archive#1464, we may also need to include the dataset_description.json file to handle the above use case of download/edit/reupload.

For this one I think we might want first to consider some alternatives, e.g. explicit --dataset-layout auto,bids,dandi option or something like that. dandiset.yaml is needed to deduce root of the dandiset and its id.

github-actions[bot] commented 4 weeks ago

:rocket: Issue was released in 0.63.0 :rocket: