Open yarikoptic opened 1 year ago
is there something that prevents detecting layout based on either local/remote dandiset.yaml + dataset_description.json ? if dataset_description
exists, it's bids, if not it's dandi. since all assets are associated with a dandiset.
Iirc we don't have association from asset to dandiset
@yarikoptic - we do (see s3 bucket) and can expose it via api. at least at this point, assets still have a one to one correspondence with dandiset, but one to many with dandiset/version. so if assets are listed per dandiset, they have explicit association.
@yarikoptic - we do (see s3 bucket) and can expose it via api. at least at this point, assets still have a one to one correspondence with dandiset, but one to many with dandiset/version. so if assets are listed per dandiset, they have explicit association.
I am asking since all such validations should be done within dandischema
, not through some ad-hoc logic/API within dandi-cli
if we would like dandi-archive
to use them, hence this issue here in dandischema
. So the question is really "where at the level of dandischema we do have such association?"
i think the difference i am making is that in dandischema such a function could take type of layout as input, but that doesn't need to be in the asset model.
Per brief monday discussion -- we might want/need to make validation of assets interface to be exposed to the notion of dandiset across the entire surface of the archive interfaces/implementation. But I would like to remind that dandi-schema is ATM the core logic for validation, so it might need to start here somehow and decide on how to expose/use it at dandi-cli/dandi-archive levels.
Might relate to https://github.com/dandi/dandi-archive/issues/1477 (by @bendichter ) where solution might be to establish regexes for URLs at dandi-schema level so they could be used/validated by both dandi-cli and dandi-archive UIs
i think the difference i am making is that in dandischema such a function could take type of layout as input, but that doesn't need to be in the asset model.
yes. my only thinking is that we might need later some other aspect of knowing dandiset metadata or identity (e.g. to do validation in conjnction with with other assets), and then would need to change it again.
why don't we start with copying the regex's over and adding a validation function to the asset model for the asset path.
in a follow up to https://github.com/dandi/dandi-cli/issues/1205#issuecomment-1421010945 of @satra
yes, that was done in https://github.com/dandi/dandi-cli/pull/1173 but it does not mean that it should stay here, at least as the "implementation logic".
In the past we removed
dandi-cli
from dependencies ofdandi-archive
IIRC to break the "dependency loop" sincedandi-cli
depends on API implementation of the archive. Hence we keep code worth reusing in both -cli and -archive elsewhere (dandi-schema, zarr_checksum) etc.So may be we should move the validate_organized_path into dandischema? But again -- then for an asset it would matter on either it is a BIDS asset or some other (DANDI layout) asset.
looking at a sample asset from 000026 which is BIDS -- we do not provide such metadata
```shell ❯ curl --silent -X 'GET' 'https://api.dandiarchive.org/api/assets/3a6cdce9-ab49-41dd-967f-29349d0cc341/' -H 'accept: application/json' | jq . { "id": "dandiasset:3a6cdce9-ab49-41dd-967f-29349d0cc341", "path": "sub-EXC022/ses-MRI/anat/sub-EXC022_ses-MRI-echo-2_flip-2_VFA.json", "access": [ { "status": "dandi:OpenAccess", "schemaKey": "AccessRequirements" } ], "digest": { "dandi:sha2-256": "f48296e2812476e813c8d11e59ab1947ddfc6a8330c32ace04f040cd5d7d213e", "dandi:dandi-etag": "a7aaf0935f14bdb4d686486f4512a299-1" }, "@context": "https://raw.githubusercontent.com/dandi/schema/master/releases/0.6.0/context.json", "schemaKey": "Asset", "contentUrl": [ "https://api.dandiarchive.org/api/assets/3a6cdce9-ab49-41dd-967f-29349d0cc341/download/", "https://dandiarchive.s3.amazonaws.com/blobs/5d5/2c6/5d52c625-2efd-4ff2-a768-bd4f15e7da46" ], "identifier": "3a6cdce9-ab49-41dd-967f-29349d0cc341", "repository": "https://dandiarchive.org/", "contentSize": 247, "dateModified": "2021-10-05T14:50:57.025495-04:00", "schemaVersion": "0.6.0", "encodingFormat": "application/json", "wasGeneratedBy": [ { "id": "urn:uuid:7eec0bb1-668c-48da-a3be-a90df5485d8c", "name": "Metadata generation", "schemaKey": "Activity", "description": "Metadata generated by DANDI cli", "wasAssociatedWith": [ { "url": "https://github.com/dandi/dandi-cli", "name": "DANDI Command Line Interface", "version": "0.27.3", "schemaKey": "Software", "identifier": "RRID:SCR_019009" } ] } ], "blobDateModified": "2021-10-05T12:55:21.243891-04:00" } ```so to validate the path we should first start annotating assets with "layout" (optional, default to DANDI, could be BIDS). Then validate either using our DANDI layout regex, or BIDS (using bidsschematools like done in dandi-cli) purely on the path (ref also #74 ).