dandi / dandi-cli

DANDI command line client to facilitate common operations
https://dandi.readthedocs.io/
Apache License 2.0
22 stars 28 forks source link

Extend `DandiInstance` to support APIs outside of `dandiarchive.org` subdomains #1517

Open aaronkanzer opened 4 weeks ago

aaronkanzer commented 4 weeks ago

@yarikoptic @jwodder -- wanted to start a GitHub Issue to get consolidation of linc-cli https://github.com/lincbrain/linc-cli back to dandi-cli -- working with @satra @kabilar here to merge LINC work back into DANDI as Ember approaches

Prior to coding a potential solution, wanted to get your thoughts. The only 2 differences that exist between dandi-cli and linc-cli are:

client.dandi_authenticate() is called upon every command in linc-cli due to the locked-down nature of LINC assets for now • LINC has an API Key called LINCBRAIN_API_KEY

There are no other differences, other than the API that is hit of course (api.lincbrain.org/.....)

My proposal here would be to:

  1. Expand the DandiInstance object offerings to include LINC in consts.py -- code reference
  2. Add a is_private_archive boolean to the DandiInstance -- code reference. This bool value would be used to decide upon what commands in dandi-cli would require client.dandi_authenticate()
  3. Include an --api-key flag that could be passed in the CLI commands to handle the API key for any extending instance (would go against the keyring pattern a bit, so curious to get your thoughts especially here)

Let me know -- I'll default to some of your wisdom here as you've authored most of the CLI tool

Cc @waxlamp -- tagging for visibility for Kitware folks

yarikoptic commented 4 weeks ago

Additional aspect on user level visible interface/configuration:

or

satra commented 4 weeks ago

do we want to see linc* instances as just sample instances of dandi and otherwise still use dandi CLI

generally speaking yes. although one could release a linc cli that effectively is a very lightweight wrapper around dandi cli, changing some configs. i think dandi has become quite synonymous with the neurophysiology repository, and it may take some work to turn it into a more general framework. however, i would be keen to do so, if possible.

technically as you raise, naming things like api keys and instances would require some thought and cli awareness.

aaronkanzer commented 4 weeks ago

do we want to see linc* instances as just sample instances of dandi and otherwise still use dandi CLI

generally speaking yes. although one could release a linc cli that effectively is a very lightweight wrapper around dandi cli, changing some configs. i think dandi has become quite synonymous with the neurophysiology repository, and it may take some work to turn it into a more general framework. however, i would be keen to do so, if possible.

technically as you raise, naming things like api keys and instances would require some thought and cli awareness.

@satra Could you provide some base case abstractions you'd like to accomplish here?

aaronkanzer commented 4 weeks ago

Additional aspect on user level visible interface/configuration:

  • do we want to see linc* instances as just sample instances of dandi and otherwise still use dandi CLI and determine which instance to operate based on dandiset metadata at hands or --instance option provided . If it stays so DANDI_API_KEY could still be used for LINC key specification IMHO.

or

  • still have a dedicated linc CLI entry point installed along with dandi which would automagically favor linc and LINC_API_KEY to be the env var?

@yarikoptic I think the idea of inferring the API via reading from the dandiset metadata serves as the lowest cognitive overhead for the end user -- as far as routing the dandi CLI command

@satra @yarikoptic my hesitancy of although one could release a linc cli that effectively is a very lightweight wrapper is that this will just lead to a handful of forks still. DANDI CLI (and LINC) are strongly coupled to the data models and analagous prescribed validations, thus, from an end user perspective, at a certain point, I'd be inclined to rather just use a cloud-vendored issued CLI tool for file management instead of dandi-cli, etc.

You could say -- "ok, users can also bring their own dandischema-ish library" too -- thus all data uploaded to their buckets would have uniformity -- however, abstraction at this time and level seems arduous without user research

satra commented 3 weeks ago

Could you provide some base case abstractions you'd like to accomplish here?

i'm not sure what this means, but the process of uploading is the same for all instances (the data models are the same - to speak to the next comment). so the abstractions are really about where to send and do you have authorization to send there.

i don't think native cloud vendor CLIs (e.g. aws CLI) work for our use cases for various reasons (e.g. we need data model checks, we need validation, and we need to support for objects like zarr). also we don't expect our work to be completely general purpose. the focus would still be neuroscience domain for now, with the scope of the data model being modifiable. also our backend datamodel is relatively simple (blob/container blob + jsonld metadata), and intentionally different from the metadata models for dandiset/asset.

aaronkanzer commented 3 weeks ago

Could you provide some base case abstractions you'd like to accomplish here?

i'm not sure what this means, but the process of uploading is the same for all instances (the data models are the same - to speak to the next comment). so the abstractions are really about where to send and do you have authorization to send there.

i don't think native cloud vendor CLIs (e.g. aws CLI) work for our use cases for various reasons (e.g. we need data model checks, we need validation, and we need to support for objects like zarr). also we don't expect our work to be completely general purpose. the focus would still be neuroscience domain for now, with the scope of the data model being modifiable. also our backend datamodel is relatively simple (blob/container blob + jsonld metadata), and intentionally different from the metadata models for dandiset/asset.

@satra sounds good to me -- I do want to note the frequent usage of --skip-validation flag (Cc @kabilar) for the LINC project, as dandischema is very coupled to all-things DANDI -- are you envisioning all clones conforming to dandischema rules? Do you envision clones submitting PRs to dandischema (for more lenient schema validation) associated with the API that is routed to via dandi-cli?

with the scope of the data model being modifiable -- are you envisioning clones defining their own Django models? Are you envisioning clones leveraging JSON fields to conform to some of DANDI's current data model constructs?

kabilar commented 3 weeks ago

Thanks team. Echoing the points above. For the use case of the LINC project, I think that the following behavior would be straightforward from the user perspective.

  1. A single client (i.e. dandi) that can interact with either the DANDI or LINC instance.
    dandi download -i linc <URL>
  2. Users would define the DANDI_API_KEY with the API key from https://dandiarchive.org or https://lincbrain.org. If we get feedback from LINC users that having a single environment variable for both the DANDI and LINC API keys is cumbersome then we can re-evaluate supporting a LINC_API_KEY.

Since dandischema changes are not currently needed for the LINC project, perhaps we can defer that discussion to the upcoming Generalization of DANDI meetings?

@aaronkanzer Your proposal on how to implement this feature sounds right (quoted below). Would you be able to submit a pull request for this work?

My proposal here would be to:

  1. Expand the DandiInstance object offerings to include LINC in consts.py -- code reference
  2. Add a is_private_archive boolean to the DandiInstance -- code reference. This bool value would be used to decide upon what commands in dandi-cli would require client.dandi_authenticate()
satra commented 3 weeks ago

dandischema is very coupled to all-things DANDI

it is and it isn't. since dandischema supports the lightsheet + mri datasets on DANDI proper, it should also be fine for validating linc datasets. essentially any bids dataset should be supported by the DANDI schema and should upload without skip validation. if that's not the case, something is off and should highlight a reconciliation.

kabilar commented 3 weeks ago

We will also have to update the DANDI and LINC docs. A few examples are below.

  1. Update the DANDI Client docs to explain that the DANDI Client can interact with the DANDI Archive and other instances.
    1. https://github.com/dandi/dandi-cli/blob/6aa414c4db47394970f586cc4fb9758a634aef87/docs/source/index.rst?plain=1#L4-L6
    2. https://github.com/dandi/dandi-cli/blob/6aa414c4db47394970f586cc4fb9758a634aef87/docs/source/cmdline/dandi.rst?plain=1#L8-L9
    3. https://github.com/dandi/dandi-cli/blob/6aa414c4db47394970f586cc4fb9758a634aef87/docs/source/modref/dandiapi.rst?plain=1#L6
    4. https://github.com/dandi/dandi-cli/blob/6aa414c4db47394970f586cc4fb9758a634aef87/dandi/cli/command.py#L73
    5. https://github.com/dandi/dandi-cli/blob/6aa414c4db47394970f586cc4fb9758a634aef87/setup.cfg#L21
    6. https://github.com/dandi/dandi-cli/blob/6aa414c4db47394970f586cc4fb9758a634aef87/README.md?plain=1#L17-L19
  2. Update the DANDI Client docs to replace --dandi-instance with --instance.
    1. https://github.com/dandi/dandi-cli/blob/6aa414c4db47394970f586cc4fb9758a634aef87/docs/source/cmdline/instances.rst?plain=1#L8-L10
  3. Update the LINC web app example download commands to replace lincbrain with dandi.
    1. https://github.com/lincbrain/linc-archive/blob/66f867b3b18553b9b8311e0b5bc0e7e1f57ea778/web/src/views/DandisetLandingView/DownloadDialog.vue#L128-L135
    2. https://github.com/lincbrain/linc-archive/blob/66f867b3b18553b9b8311e0b5bc0e7e1f57ea778/web/src/components/FileBrowser/FileUploadInstructions.vue#L63
  4. Update the LINC docs to replace lincbrain-cli/lincbrain with dandi.
    1. https://docs.lincbrain.org/upload/
    2. https://docs.lincbrain.org/#about-this-doc
kabilar commented 3 weeks ago
  1. Remove lincbrain-cli installation from the LINC Hub
    1. https://github.com/dandi/dandi-hub/blob/ddde7ba3dc0cd0d4927b839dbd7aaac5ff20670f/envs/linc/jupyterhub-overrides.yaml#L96
kabilar commented 3 weeks ago

client.dandi_authenticate() is called upon every command in linc-cli due to the locked-down nature of LINC assets for now

Hi @aaronkanzer, I may have missed it but I didn't notice these calls in the diff pull request (https://github.com/lincbrain/linc-cli/pull/63). We only added client.dandi_authenticate() for the lincbrain move command to fix a bug, but those changes are now also upstream in dandi/dandi-cli.