dandi / dandi-cli

DANDI command line client to facilitate common operations
https://dandi.readthedocs.io/
Apache License 2.0
22 stars 25 forks source link

dandi register/upload/update workflow #41

Closed satra closed 2 years ago

satra commented 4 years ago

register/upload process

  1. Register a new dataset online, and use the UI to add minimal metadata. This should also:
    • create a staged folder, set the identifier to the new dataset id, version set to "staging"
    • create a dandiset.yaml file within the staged folder
    • display the dataset id for the user to use with the CLI
  2. Retrieve an authorization token from the UI
  3. organize files into the dataset
    • Retrieve the dataset
    • this should update the dandiset.yaml file
  4. upload dataset to dandi

update process questions:

  1. what is the reference dataset? local, remote, collaborator's
  2. across contributors in a team, how are operations (add/delete) synchronized?
yarikoptic commented 4 years ago

forgot about the token! so pretty much should be "register on dandiarchive and retrieve authorization token" before current "5. Use dandi register ...".

I am ok to make workflow more dandiarchive-centric and will absorb current 1-4 (prepare nwb ... validate) as a detailed version of proposed here 4. organize files. Will look into that tomorrow.

Re "update"

  1. reference -- I guess it depends on the step in the workflow. If I am a new collaborator -- it would be dandiarchive's version. If I am initial contributor to the archive, reference would be local since I will keep uploading new versions. So I think there cannot be a single answer
  2. sync -- without introducing VCS features (e.g. establishing a trail of file UUIDs regenreated by pynwb upon save... which is may be what would be just sufficient, filed #43) we probably cannot provide much beyond some safeguards -- by default to not allow upload versions older than present on dandi, and to download over newer local files. So it would be for researchers to not shoot themselves in the foot, not unlike working with some shared folder on google or box.com. edit and without proper VCS in the back we would not be able to provide proper support for deletions, renames etc anyways. So for now I would just rely on aforementioned basic safeguards until we (re)evaluable backend choices.
satra commented 4 years ago

not unlike working with some shared folder on google or box.com

but those provide versioning. so they can at least recover deleted files.

while not super urgent, i think we would want to move to better versioning soon. the collaborative write process is always hairy!

yarikoptic commented 4 years ago

ha -- I just added an edit about VCS - I am all for discussing versioning ;-)

mgrauer commented 4 years ago

I think we should not do create a dandiset.yaml file within the staged folder for now, as that will break metadata editing in the UI. Let's hold off on that until we've thought through the metadata editing/versioning lifecycle, with both UI and CLI.

satra commented 4 years ago

when the cli downloads the dataset it will be forced to create a local dandiset.yaml. the nwb wrelated metadata will be extracted and add to the dandiset.yaml. and this will be pushed back by the client. since it is a cli, we could do the conversion (folder metadata <--> file) behind the scenes.

temporarily we could also enable a checkout flag, such that a team member can checkout the dandiset for modification, but other team members can overwrite the checkout. we leave it to a team's social networking to resolve this.

yarikoptic commented 2 years ago

I think this workflow was at large implemented and documented within handbook, no further action is needed on this particular issue. If needed more work, separate dedicated issues could be filed referencing this issue.