Closed mgrauer closed 4 years ago
@satra @yarikoptic
@mgrauer - thank you for bringing this up. i feel we need to have this discussion alongside the API discussion and the BICCN dataset/file manifest discussions. in general, yes, i would like the contract to be stable for dandi-cli
. in addition, the reason i bring up the API is that other people should be able to write clients (command line/web/etc.,.) that can read/write data to the archive. and we should decide if our endpoints should support RESTful or something else.
the API discussion is also tied to the dandiset/asset schema discussion and our search discussion. so would it be ok if i focus on creating a draft of our data model and then we come back to this in the next week or two?
regarding this specific suggestion:
dandi download dandiset <dandisetid>
dandi download subject <subjectId>
dandi download nwb <filename>
on one hand i think this is clear and speaks to an entity based representation, but for this to work every id/filename would need to be unique.
on the other hand the following (modeled after github) could serve to be both unique and resolvable by a user or a client and could map on to s3 for people simply using aws s3
cli.
dandi download <prefix>/00001
dandi download <prefix>/00001/<version id>/subjectid
dandi download <prefix>/00001/<version id>/subjectid/path/to/file
where <version id>
needs to be discussed as you note and <prefix>
could be https://dandiarchive.org/dandiset/
or DANDI
as we have that mapped in identifiers.org or a new prefix like https://raw.dandiarchive.org/
.
this would allow us to also do bids datasets without changing any semantics of the url(**).
thus before we decide on which route we should take, i would like to present to the dev team a data model, metadata, and some views of this model.
** this is not completely true because of how we are encoding dandiset metadata.
Thank you @mgrauer and @satra! I am on board with both of you and Satra expressed my thoughts at large and addressed the aspect of be able to support different instantiations of dandi archive. We should chat ;-)
would it be ok if i focus on creating a draft of our data model and then we come back to this in the next week or two?
Yes, that's fine, especially given that it sounds like @yarikoptic is on board with waiting for the discussion. My motivation for this issue is that @yarikoptic has been feeling some pain lately dealing with the client-server interaction, so I wanted to start addressing that.
There are a few different issues that are related (and note that these cover quite a bit of the roadmap over the next six months):
We are planning to demo a prototype of the publish workflow at the next meeting, so that should generate some good discussion. This prototype won't include any versioning as that adds complexity and still needs to be worked out, but at least will get us a basic end-to-end workflow in place that we can start iterating on.
as for "download", let's continue on https://github.com/dandi/dandi-cli/issues/183 . Note that we already have support for simpe DANDI:<ID>
(coming in 0.6.0)
For any other: let's open an issue if there is no issue. Most of those operations interacting with the archive should in generally be lean interfaces to talk to the API.
Please feel free to reopen, or transfer into dandi-cli
if you feel that it is worth reincarnating it.
I'd like to suggest a stable contract for the dandi-cli, moving away from passing URLs to the CLI, and I'd want to start with downloads.
Currently, the dandi-cli takes in URLs from the command line, and then does parsing and dispatching. This is prone to breakage because we don't have a stable set of URLs, and it also requires the user to understand the meaning of the URLs and our various services (currently Girder and the GUI) which are implementation details that would be better hidden from the user.
I'd like to propose something like the following commands for download. I'm starting with the download command because it's simpler in that it is read only, it will likely have a wider user base than other commands (my guess is that many people will want to download datasets without doing any other interaction with the system--generally a shallower interaction, but a more common one; whereas a smaller number of users will have deeper interactions with the system around building draft dandisets and publication), and it is more straightforward to provide UI support for this. Other commands we could probably follow suit with later after we work out download.
The cli could support the following commands:
dandi download dandiset 00001
(we would need some provision for supporting versions, but the first thing to do is to support downloading the latest published version)dandi download subject <subjectId>
(maybe this is a Girder folderId or a subject name)dandi download nwb <filename>
(maybe this is a Girder fileId or a file name)Then we can have on the UI a component such as what GitHub has, that allows the user to easily see what command to run on the cli for a particular resource (see screenshot). E.g., when you are on the dandiset landing page for dandiset 000001 it has a UI button labeled "download" and when you click it, it tells you "Use the CLI (and here's how to install it and run it), and run
dandi download dandiset 000001
.This way, the CLI commands are more stable, we are providing a more limited set of commands to support, and the cli doesn't have to parse URLs. Rather we get to define which behaviors we want to support, and within the functions that provide those behaviors, build up the commands and URLs we need to support that behavior. If the URLs change, then we can update the specific places that rely on the URLs which will be built up, rather than having to change parsing code (which is never fun).
Overall, importantly, this is the start of a contract between the server and the cli about which sets of behaviors we support, instead of just whatever URLs happen to exist on the server at a given time. I think this will also make it much easier for people working on the server side to update the client.