Open jsheunis opened 1 year ago
Another idea:
The schema can accept additional fields without failing to validate. This means that metadata containing new fields can be passed to the catalog without having to update the catalog schema. If e.g. an extra property additional_display_definitions
is passed, where this is an object providing semantic definitions of the keys and values in additional_display
, these definitions could be rendered in the catalog as extra information.
For example, let's say we have additional_display
property of a dataset-level metadata item:
"additional_display": [
{
"name": "SFB1451",
"icon": "fa-solid fa-flask",
"content": {
"homepage": "https://github.com/allisonhorst/palmerpenguins",
"CRC project": "INF",
"data controller": {
"email": "ahorst@ucsb.edu",
"name": "Allison Horst"
},
"sample (organism)": [
"Adelie penguin (Pygoscelis adeliae; NCBITaxon_9238)",
"Gentoo penguin (Pygoscelis papua; NCBITaxon_30457)",
"chinstrap penguin (Pygoscelis antarcticus; NCBITaxon_79643)"
],
"sample (organism part)": "body proper (UBERON_0013702)",
"Used for": "Testing effort for DBI backends \u2014 The dataset is used as example data for testing data base backend features in automated tests (https://dbitest.r-dbi.org/)"
}
}
],
then the agent creating this metadata item could also include another property additional_display_definitions
:
"additional_display_definitions": [
{
"name": "SFB1451",
"keys": {
"homepage": "https://schema.org/mainEntityOfPage",
"CRC project": "",
"data controller": {
"self": "https://w3id.org/dpv#hasDataController",
"email": "https://schema.org/email",
"name": "https://schema.org/name"
},
"sample (organism)": "",
"sample (organism part)": "",
"Used for": "http://www.w3.org/ns/prov#hadUsage"
},
"values": {
"homepage": {},
"CRC project": {},
"data controller": {
"email": {},
"name": {}
},
"sample (organism)": {},
"sample (organism part)": {},
"Used for": {}
}
}
]
Both keys
and values
are included above, since either or both could have semantic definitions. But it's not expected that both or either would always be provided or necessary.
Relatedly, the whole context of the original metadata item before it was translated into the catalog schema could also be passed to the catalog as part of the metadata record, e.g. in the property @context
. AFAIK this isn't a reserved keyword/property in jsonschema.
[Idea 2] We could add another generic key-value property to the schema, something like semantic_metadata, which would allow for passing the term definitions along with the key and value, for multiple records. This could then get a dedicated display area in a catalog's dataset page.
I think "sample (organism)" and similar are the natural candidate for this approach. Here, we were starting with either a code (NCBITaxon:9237) or an equivalent IRI (http://purl.obolibrary.org/obo/NCBITaxon_9237).
AFAIK, the current "additional display" can only show strings (or repr of an object, which is still a string). I found no way to display an URL.
FTR, "Adelie penguin (Pygoscelis adeliae; NCBITaxon_9238)" was created from "NCBITaxon:9238" by a basic request-response API query to Ontology Lookup Service while translating incoming data to catalog schema. I think catalog should not do any such queries, but it could allow displaying URLs as links.
Maybe we could have a way to pass a (specifically formatted?) object containing text and url to additional display that would be rendered as a hyperlink?
AFAIK, the current "additional display" can only show strings (or repr of an object, which is still a string). I found no way to display an URL.
Some of this is now addressed with https://github.com/datalad/datalad-catalog/pull/347. Although indeed, URLs will still not render as links. I agree we should get that to work.
There are two ways of approaching it:
url
or check if the actual value contains http[s]://
, and assume that it should then be rendered as a link.I do however think it might be a nifty feature for a user to be able to query the definition of a term instead of (or in addition to) being presented with an uninformative link. This doesn't have to happen automatically, rather after specific user input, e.g. clicking on an info icon.
The following issue cannot be transferred here, but is highly applicable in that it started discussing thoughts of catalog rendering from a semantic data view: https://github.com/psychoinformatics-de/sfb1451-projects-catalog/issues/46
Relevant issue related to additional_display
rendering, when additional display has symantic information included: https://github.com/sfb1451/tabby-utils/issues/14
Relevant issue related to additional_display rendering, when additional display has symantic information included: https://github.com/sfb1451/tabby-utils/issues/14
This has since been incorporated into the main branch of this repo. In addition, I think it would be useful for existing catalog instances to have a new and separate field in the dataset schema, something like "semantic properties", somewhat similar to the "relation" field used in datalad-concepts. This would be a good place for any property of the dataset that can be expressed semantically and is typically some relation, e.g. "sameas", "homepage", "study field"
It would be ideal if datalad's whole metadata handling and rendering stack could work with JSON-LD data in a seamless way.
An example use-case is the tabby-to-catalog pipeline: if we have JSON-LD records coming out of
tabby
files, how do we handle these records in order to have the rich semantic information rendered sensibly in a catalog?Let's use the current catalog schema, here a part of the specific schema for a dataset, as an example to work from:
The dataset schema has several properties that can be contained in an incoming metadata record, and a minimal amount of properties are required. There are properties that expect specific fields and formats (e.g.
author
, which should havegivenName
,familyName
, etc) and there are properties that can receive generic key-value pairs (e.g.additional_display
andtop_display
).How should we approach updates to such a schema in order to allow JSON-LD data through
Idea 1
If we want to define catalog schema terms ourselves, i.e. turn it into a semantic schema, we could for example add a
definition
field to each property which would contain a definition URL from some accessible ontology.Idea 2
We could add another generic key-value property to the schema, something like
semantic_metadata
, which would allow for passing the term definitions along with the key and value, for multiple records. This could then get a dedicated display area in a catalog's dataset page.Idea 3
Perhaps the catalog should somehow evolve into something that can just interpret and render any JSON-LD document, or at least those adhering to some convention as described by a given context. This whole concept needs further exploration, and possible a paradigm shift in terms of how metadata is added to a catalog and rendered by it.