Open aaronkanzer opened 4 weeks ago
Additional aspect on user level visible interface/configuration:
linc
* instances as just sample instances of dandi and otherwise still use dandi
CLI and determine which instance to operate based on dandiset metadata at hands or --instance
option provided . If it stays so DANDI_API_KEY
could still be used for LINC key specification IMHO.or
linc
CLI entry point installed along with dandi
which would automagically favor linc
and LINC_API_KEY
to be the env var? do we want to see linc* instances as just sample instances of dandi and otherwise still use dandi CLI
generally speaking yes. although one could release a linc cli that effectively is a very lightweight wrapper around dandi cli, changing some configs. i think dandi has become quite synonymous with the neurophysiology repository, and it may take some work to turn it into a more general framework. however, i would be keen to do so, if possible.
technically as you raise, naming things like api keys and instances would require some thought and cli awareness.
do we want to see linc* instances as just sample instances of dandi and otherwise still use dandi CLI
generally speaking yes. although one could release a linc cli that effectively is a very lightweight wrapper around dandi cli, changing some configs. i think dandi has become quite synonymous with the neurophysiology repository, and it may take some work to turn it into a more general framework. however, i would be keen to do so, if possible.
technically as you raise, naming things like api keys and instances would require some thought and cli awareness.
@satra Could you provide some base case abstractions you'd like to accomplish here?
Additional aspect on user level visible interface/configuration:
- do we want to see
linc
* instances as just sample instances of dandi and otherwise still usedandi
CLI and determine which instance to operate based on dandiset metadata at hands or--instance
option provided . If it stays soDANDI_API_KEY
could still be used for LINC key specification IMHO.or
- still have a dedicated
linc
CLI entry point installed along withdandi
which would automagically favorlinc
andLINC_API_KEY
to be the env var?
@yarikoptic I think the idea of inferring the API via reading from the dandiset metadata serves as the lowest cognitive overhead for the end user -- as far as routing the dandi
CLI command
@satra @yarikoptic my hesitancy of although one could release a linc cli that effectively is a very lightweight wrapper
is that this will just lead to a handful of forks still. DANDI CLI (and LINC) are strongly coupled to the data models and analagous prescribed validations, thus, from an end user perspective, at a certain point, I'd be inclined to rather just use a cloud-vendored issued CLI tool for file management instead of dandi-cli
, etc.
You could say -- "ok, users can also bring their own dandischema-ish library" too -- thus all data uploaded to their buckets would have uniformity -- however, abstraction at this time and level seems arduous without user research
Could you provide some base case abstractions you'd like to accomplish here?
i'm not sure what this means, but the process of uploading is the same for all instances (the data models are the same - to speak to the next comment). so the abstractions are really about where to send and do you have authorization to send there.
i don't think native cloud vendor CLIs (e.g. aws CLI) work for our use cases for various reasons (e.g. we need data model checks, we need validation, and we need to support for objects like zarr). also we don't expect our work to be completely general purpose. the focus would still be neuroscience domain for now, with the scope of the data model being modifiable. also our backend datamodel is relatively simple (blob/container blob + jsonld metadata), and intentionally different from the metadata models for dandiset/asset.
Could you provide some base case abstractions you'd like to accomplish here?
i'm not sure what this means, but the process of uploading is the same for all instances (the data models are the same - to speak to the next comment). so the abstractions are really about where to send and do you have authorization to send there.
i don't think native cloud vendor CLIs (e.g. aws CLI) work for our use cases for various reasons (e.g. we need data model checks, we need validation, and we need to support for objects like zarr). also we don't expect our work to be completely general purpose. the focus would still be neuroscience domain for now, with the scope of the data model being modifiable. also our backend datamodel is relatively simple (blob/container blob + jsonld metadata), and intentionally different from the metadata models for dandiset/asset.
@satra sounds good to me -- I do want to note the frequent usage of --skip-validation
flag (Cc @kabilar) for the LINC project, as dandischema
is very coupled to all-things DANDI -- are you envisioning all clones conforming to dandischema
rules? Do you envision clones submitting PRs to dandischema
(for more lenient schema validation) associated with the API that is routed to via dandi-cli
?
with the scope of the data model being modifiable
-- are you envisioning clones defining their own Django models? Are you envisioning clones leveraging JSON fields to conform to some of DANDI's current data model constructs?
Thanks team. Echoing the points above. For the use case of the LINC project, I think that the following behavior would be straightforward from the user perspective.
dandi
) that can interact with either the DANDI or LINC instance.
dandi download -i linc <URL>
DANDI_API_KEY
with the API key from https://dandiarchive.org or https://lincbrain.org. If we get feedback from LINC users that having a single environment variable for both the DANDI and LINC API keys is cumbersome then we can re-evaluate supporting a LINC_API_KEY
.Since dandischema
changes are not currently needed for the LINC project, perhaps we can defer that discussion to the upcoming Generalization of DANDI
meetings?
@aaronkanzer Your proposal on how to implement this feature sounds right (quoted below). Would you be able to submit a pull request for this work?
My proposal here would be to:
- Expand the
DandiInstance
object offerings to includeLINC
inconsts.py
-- code reference- Add a
is_private_archive
boolean to theDandiInstance
-- code reference. This bool value would be used to decide upon what commands indandi-cli
would requireclient.dandi_authenticate()
dandischema is very coupled to all-things DANDI
it is and it isn't. since dandischema supports the lightsheet + mri datasets on DANDI proper, it should also be fine for validating linc datasets. essentially any bids dataset should be supported by the DANDI schema and should upload without skip validation. if that's not the case, something is off and should highlight a reconciliation.
We will also have to update the DANDI and LINC docs. A few examples are below.
--dandi-instance
with --instance
.
lincbrain
with dandi
.
lincbrain-cli
/lincbrain
with dandi
.
lincbrain-cli
installation from the LINC Hub
•
client.dandi_authenticate()
is called upon every command inlinc-cli
due to the locked-down nature of LINC assets for now
Hi @aaronkanzer, I may have missed it but I didn't notice these calls in the diff pull request (https://github.com/lincbrain/linc-cli/pull/63). We only added client.dandi_authenticate()
for the lincbrain move
command to fix a bug, but those changes are now also upstream in dandi/dandi-cli
.
@yarikoptic @jwodder -- wanted to start a GitHub Issue to get consolidation of
linc-cli
https://github.com/lincbrain/linc-cli back todandi-cli
-- working with @satra @kabilar here to merge LINC work back into DANDI as Ember approachesPrior to coding a potential solution, wanted to get your thoughts. The only 2 differences that exist between
dandi-cli
andlinc-cli
are:•
client.dandi_authenticate()
is called upon every command inlinc-cli
due to the locked-down nature of LINC assets for now • LINC has an API Key calledLINCBRAIN_API_KEY
There are no other differences, other than the API that is hit of course (
api.lincbrain.org/.....
)My proposal here would be to:
DandiInstance
object offerings to includeLINC
inconsts.py
-- code referenceis_private_archive
boolean to theDandiInstance
-- code reference. This bool value would be used to decide upon what commands indandi-cli
would requireclient.dandi_authenticate()
--api-key
flag that could be passed in the CLI commands to handle the API key for any extending instance (would go against thekeyring
pattern a bit, so curious to get your thoughts especially here)Let me know -- I'll default to some of your wisdom here as you've authored most of the CLI tool
Cc @waxlamp -- tagging for visibility for Kitware folks