dandi / dandi-cli

DANDI command line client to facilitate common operations
https://dandi.readthedocs.io/
Apache License 2.0
19 stars 24 forks source link

provide LD-ready `ls -f jsonld{,_pp}` output? #700

Open yarikoptic opened 3 years ago

yarikoptic commented 3 years ago

@satra , sorry for a basic LD question -- how would we do that? ;)

@surchs is interested to see how dandischema records might look like in json-ld and might be interested to use/help with all json-ld'ification.

ATM we have a https://raw.githubusercontent.com/dandi/schema/master/releases/0.4.4/context.json which we do provide in json representation of the metadata for Dandiset:

an example of a "json_dict" dump of a Dandiset ```shell (dandisets) dandi@drogon:/mnt/backup/dandi/dandisets$ for ds in 000*; do python -c "from dandischema.models import Dandiset; import yaml; ds = Dandiset.unvalidated(**yaml.load(open('$ds/dandiset.yaml'), Loader=yaml.FullLoader));print(ds.json(indent=2))" && echo $ds; break; done { "@context": "https://raw.githubusercontent.com/dandi/schema/master/releases/0.4.4/context.json", "about": [ { "identifier": "UBERON:0002436", "name": "primary visual cortex", "schemaKey": "Anatomy" } ], "access": [ { "contactPoint": { "email": "petersen.peter@gmail.com", "schemaKey": "ContactPoint" }, "status": "dandi:OpenAccess" } ], "assetsSummary": { "approach": [ { "name": "electrophysiological approach", "schemaKey": "ApproachType" }, { "name": "behavioral approach", "schemaKey": "ApproachType" } ], "dataStandard": [ { "identifier": "RRID:SCR_015242", "name": "Neurodata Without Borders (NWB)", "schemaKey": "StandardsType" } ], "measurementTechnique": [ { "name": "signal filtering technique", "schemaKey": "MeasurementTechniqueType" }, { "name": "fourier analysis technique", "schemaKey": "MeasurementTechniqueType" }, { "name": "spike sorting technique", "schemaKey": "MeasurementTechniqueType" }, { "name": "behavioral technique", "schemaKey": "MeasurementTechniqueType" }, { "name": "multi electrode extracellular electrophysiology recording technique", "schemaKey": "MeasurementTechniqueType" } ], "numberOfBytes": 2559248010229, "numberOfFiles": 101, "numberOfSubjects": 16, "schemaKey": "AssetsSummary", "species": [ { "identifier": "http://purl.obolibrary.org/obo/NCBITaxon_10090", "name": "House mouse", "schemaKey": "SpeciesType" } ], "variableMeasured": [ "DecompositionSeries", "LFP", "Units", "Position", "ElectricalSeries" ] }, "citation": "Senzai, Yuta; Fernandez-Ruiz, Antonio; Buzs\u00e1ki, Gy\u00f6rgy (2021) Layer-Specific Physiological Features and Interlaminar Interactions in the Primary Visual Cortex of the Mouse (Version draft) [Data set]. DANDI archive. https://dandiarchive.org/dandiset/000003/draft", "contributor": [ { "affiliation": [], "includeInCitation": true, "name": "Senzai, Yuta", "roleName": [ "dcite:Author", "dcite:ContactPerson", "dcite:DataCollector", "dcite:FormalAnalysis" ], "schemaKey": "Person" }, { "affiliation": [], "identifier": "0000-0001-8481-0796", "includeInCitation": true, "name": "Fernandez-Ruiz, Antonio", "roleName": [ "dcite:Author", "dcite:FormalAnalysis" ], "schemaKey": "Person" }, { "affiliation": [ { "contactPoint": [], "identifier": "https://ror.org/005dvqh91", "includeInCitation": false, "name": "New York University Langone Medical Center", "roleName": [], "schemaKey": "Affiliation", "url": "http://nyulangone.org/" } ], "identifier": "0000-0002-3100-4800", "includeInCitation": true, "name": "Buzs\u00e1ki, Gy\u00f6rgy", "roleName": [ "dcite:Author" ], "schemaKey": "Person" } ], "description": "Data from \"Layer-Specific Physiological Features and Interlaminar Interactions in the Primary Visual Cortex of the Mouse\" Senzai, Fernandez-Ruiz, Buzsaki, Neuron 2019. Electrophysiology recordings of hippocampus during theta maze exploration.", "ethicsApproval": [], "id": "DANDI:000003/draft", "identifier": "DANDI:000003", "keywords": [ "cell types", "cortical layers", "current source density", "laminar recordings", "optogenetics", "oscillations", "primary visual cortex", "sleep", "alpha rhythm" ], "license": [ "spdx:CC-BY-4.0" ], "manifestLocation": [], "name": "Layer-Specific Physiological Features and Interlaminar Interactions in the Primary Visual Cortex of the Mouse", "protocol": [], "relatedResource": [ { "identifier": "doi:10.1016/j.neuron.2016.12.011", "relation": "dcite:IsDescribedBy", "url": "https://doi.org/10.1016/j.neuron.2016.12.011" } ], "repository": "https://dandiarchive.org/", "schemaKey": "Dandiset", "schemaVersion": "0.4.4", "studyTarget": [], "url": "https://dandiarchive.org/dandiset/000003/draft", "version": "draft", "wasGeneratedBy": [], "acknowledgement": null, "dateCreated": null, "dateModified": null } ```

but I guess many items/contexts are just "ending" at dandi: and even age is not really "linked" anywhere.

and if we do add @context to `Dandiset`'s json_dict dump, it seems absent from an asset record ```shell $> python -c 'import json; from dandischema.models import Asset; from dandi.dandiapi import DandiAPIClient as C; c=C(); ds = c.get_dandiset("000004"); asset = next(ds.get_assets());print(json.dumps(Asset(**asset.get_raw_metadata()).json_dict(), indent=2))' { "id": "dandiasset:38304fe9-5f37-4c0d-a741-9cf2bafab9ff", "schemaKey": "Asset", "schemaVersion": "0.4.4", "keywords": [ "Intracranial Recordings", "Intractable Epilepsy", "Single-Unit Recordings", "Cognitive Neuroscience", "Learning", "Memory", "Neurosurgery" ], "access": [ { "schemaKey": "AccessRequirements", "status": "dandi:OpenAccess" } ], "repository": "https://dandiarchive.org/", "wasGeneratedBy": [ { "id": "urn:uuid:4fa83952-03c6-4868-b9dd-1247ac8ae6b4", "schemaKey": "Activity", "name": "Metadata generation", "description": "Metadata generated by DANDI cli", "wasAssociatedWith": [ { "schemaKey": "Software", "identifier": "RRID:SCR_019009", "name": "DANDI Command Line Interface", "version": "0.22.0", "url": "https://github.com/dandi/dandi-cli" } ] } ], "contentSize": 73156888, "encodingFormat": "application/x-nwb", "digest": { "dandi:sha2-256": "c4994d36fe0c7f0c19917b452c9355567ade7f449f17156dffea5cd5e53e519f", "dandi:dandi-etag": "72aede0fb38eef9a1e94350c6fe7382b-2" }, "path": "sub-P10HMH/sub-P10HMH_ses-20060901_ecephys+image.nwb", "dateModified": "2021-07-01T16:30:11.012406-04:00", "blobDateModified": "2020-10-20T23:40:40.971199-04:00", "approach": [ { "schemaKey": "ApproachType", "name": "electrophysiological approach" } ], "measurementTechnique": [ { "schemaKey": "MeasurementTechniqueType", "name": "spike sorting technique" } ], "variableMeasured": [ { "schemaKey": "PropertyValue", "value": "Units" } ], "wasAttributedTo": [ { "schemaKey": "Participant", "identifier": "P10HMH", "sex": { "schemaKey": "SexType", "identifier": "http://purl.obolibrary.org/obo/PATO_0000384", "name": "Male" }, "species": { "schemaKey": "SpeciesType", "identifier": "http://purl.obolibrary.org/obo/NCBITaxon_9606", "name": "Human" } } ], "identifier": "38304fe9-5f37-4c0d-a741-9cf2bafab9ff", "contentUrl": [ "https://api.dandiarchive.org/api/assets/38304fe9-5f37-4c0d-a741-9cf2bafab9ff/download/", "https://dandiarchive.s3.amazonaws.com/blobs/284/eb3/284eb346-0bc5-42a6-9b33-268e6b0b0bde" ] ```
satra commented 3 years ago

the current API responses for a versioned dandiset and individual assets are LD compliant. i.e. they will convert to triples, turtle, etc. you can test it with reproschema. example for a dandiset and an asset.

reproschema convert --format turtle https://api.dandiarchive.org/api/dandisets/000004/versions/draft/
reproschema convert --format turtle https://api.dandiarchive.org/api/dandisets/000004/versions/draft/assets/38304fe9-5f37-4c0d-a741-9cf2bafab9ff/

you can change format from turtle to n-triples to get statements. and you can provide your context to transform from one jsonld representation to another.

and if we do add @context to Dandiset's json_dict dump, it seems absent from an asset record

we cannot use "@context" presently. we could add a context field that we populate and overwrite the key name when we export. but that is one key in jsonld that cannot be over-ridden, so i have stayed away from it in the models. we could provide a jsonld export that could add the context. right now, it's added in the API layer.

satra commented 3 years ago

if you add the following to a prefixes.json file

{   "DANDI": "http://identifiers.org/DANDI:",
    "dandi": "http://schema.dandiarchive.org/",
    "dct": "http://purl.org/dc/terms/",
    "owl": "http://www.w3.org/2002/07/owl#",
    "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
    "rdfa": "http://www.w3.org/ns/rdfa#",
    "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
    "xsd": "http://www.w3.org/2001/XMLSchema#",
    "skos": "http://www.w3.org/2004/02/skos/core#",
    "prov": "http://www.w3.org/ns/prov#",
    "pav": "http://purl.org/pav/",
    "nidm": "http://purl.org/nidash/nidm#",
    "uuid": "http://uuid.repronim.org/",
    "rs": "http://schema.repronim.org/",
    "RRID": "https://scicrunch.org/resolver/RRID:",
    "ORCID": "https://orcid.org/",
    "ROR": "https://ror.org/",
    "PATO": "http://purl.obolibrary.org/obo/PATO_"
}

you can also do:

reproschema convert --format turtle --prefixfile prefixes.json https://api.dandiarchive.org/api/dandisets/000004/versions/draft/assets/38304fe9-5f37-4c0d-a741-9cf2bafab9ff/
satra commented 3 years ago

@yarikoptic - easiest way to return jsonld for api assets is to just return raw metadata. we should think about what would make sense in dandischema/cli