dandi / dandi-cli

DANDI command line client to facilitate common operations
https://dandi.readthedocs.io/
Apache License 2.0
22 stars 25 forks source link

dandi upload --dandiset-path? #892

Closed bendichter closed 2 years ago

bendichter commented 2 years ago

dandi upload --help mentions a --dandiset-path arg, but I don't see one in the options. What is the proper way to set this without being in that directory?

$ dandi upload --help
Usage: dandi upload [OPTIONS] [PATHS]...

  Upload Dandiset files to DANDI Archive.

  The target Dandiset to upload to must already be registered in the archive,
  and a `dandiset.yaml` file must exist in the local `--dandiset-path`.

  Local Dandiset should pass validation.  For that, the assets should first be
  organized using the `dandi organize` command.

  By default all .nwb files in the Dandiset (excluding directories starting
  with a period) will be considered for the upload.  You can point to specific
  files you would like to validate and have uploaded.

Options:
  -e, --existing [error|skip|force|overwrite|refresh]
                                  What to do if a file found existing on the
                                  server. 'skip' would skipthe file, 'force' -
                                  force reupload, 'overwrite' - force upload
                                  if either size or modification time differs;
                                  'refresh' - upload only if local
                                  modification time is ahead of the remote.
                                  [default: refresh]
  -J, --jobs N[:M]                Number of files to upload in parallel and,
                                  optionally, number of upload threads per
                                  file
  --sync                          Delete assets on the server that do not
                                  exist locally
  --validation [require|skip|ignore]
                                  Data must pass validation before the upload.
                                  Use of this option is highly discouraged.
                                  [default: require]
  -i, --dandi-instance [dandi|dandi-api-local-docker-tests|dandi-devel|dandi-staging]
                                  DANDI instance to use  [env var:
                                  DANDI_INSTANCE;default: dandi]
  --help                          Show this message and exit.
jwodder commented 2 years ago

@bendichter The code that adds the --dandiset-path option is currently commented out for some reason, so you have no choice but to be in the Dandiset directory or a subdirectory thereof.

@yarikoptic Should the --dandiset-path option be uncommented, or should the references to it be removed from the documentation?

bendichter commented 2 years ago

@jwodder thanks for the quick response! I would very much like to be able to upload a directory that is not the cwd

jwodder commented 2 years ago

@bendichter What exactly is preventing you from cding to that directory before running dandi upload?

If you really need to upload a non-PWD directory, you'll have to call dandi.upload.upload() yourself; cf. how dandi upload does it.

bendichter commented 2 years ago

@jwodder

I am confused by the upload function as well. It has a required arg of paths and an optional arg of dandiset_path, and no docstring to explain the difference between these args. There is also no type hint to indicate what type "paths" should be. A list of strings? Should I pass in an empty list and use the dandiset_path optional arg? Could you please add a docstring so I know how to use this function?

jwodder commented 2 years ago

@bendichter Type hints were added in #887; the function looks like this now:

https://github.com/dandi/dandi-cli/blob/11a95f184de1c482c4c3a073a1c59fdeaca02fce/dandi/upload.py#L29-L41

I'm not going to write out a full docstring right now, but for the arguments in question:

bendichter commented 2 years ago

@jwodder ok thanks that's what I needed to know

bendichter commented 2 years ago

@jwodder how would you feel about making paths optional?

jwodder commented 2 years ago

@bendichter Done: https://github.com/dandi/dandi-cli/pull/899

yarikoptic commented 2 years ago

@bendichter What exactly is preventing you from cding to that directory before running dandi upload?

and wouldn't dandi upload the-path-to-dandiset-top-directory/ work? --dandiset-path is really to just establish where the "top" of the dandiset is, may be too inspired by datalad's UI.

edit: i.e. should we do anything extra for this issue or just consider it resolved?

jwodder commented 2 years ago

@yarikoptic That wouldn't work, as upload would still start looking in the current directory for the dandiset.yaml file. If an explicit path given on the command line isn't under that, you'll get an error.

should we do anything extra for this issue or just consider it resolved?

We need to decide whether to uncomment --dandset-path or remove its references from the documentation.

yarikoptic commented 2 years ago

Dandiset.find (which is ran on dandiset_path) already uses find_parent_directory_containing and thus possibly ascending path hierarchy, so dandiset_path can point to some other location within dandiset and thus being not really "strict" for what --dandiset-path points to.

So, overall I am still not sure on whether we should keep that dandiset_path at all since it doesn't really serve any pragmatic purpose (unlike in datalad where it might define the "scope" of operation in particular across dataset boundaries), and instead just make handling of paths a bit smarter.

WDYT @jwodder ?

jwodder commented 2 years ago

@yarikoptic That sounds like a good strategy, but I think we should keep the Dandiset-finding behavior the same between dandi upload and dandi organize (the only command that currently has a --dandiset-path option). For the record, its current behavior is: if --dandiset-path is specified, use that (without recursing upwards); otherwise, recurse upwards from the current directory in search of dandiset.yaml. What strategy should be preferred for both commands?

yarikoptic commented 2 years ago

re behavior if command has a --dandiset-path argument -- totally agree, they should behave identically. organize seems to use Dandiset.find from CWD only no dataset-path is specified, and if specified -- takes it as is, not search is performed. I think such behavior is ok and explicit.

organize does need --dandiset-path as the one to provide location where to organize into if cannot be deduced from CWD.

upload -- I feel it does not need --dandiset-path since if paths are specified -- they pretty much instruct about the location of dandiset-path (hence above comment), and if not specified -- deduce from CWD to upload entire dandiset user is under. Then we could just remove handling of dandiset_path from upload and not bother harmonizing its behavior to organize. Agree?

jwodder commented 2 years ago

@yarikoptic Agreed: https://github.com/dandi/dandi-cli/pull/916