Open sneumann opened 10 months ago
An activity at the BH_2023 was to analyse what DefinedTerm
definitions we find in the wild, specifically in the live-deploy's exampleURL
s. Please see https://github.com/elixir-europe/biohackathon-projects-2023/tree/main/7/DefinedTerms
for the collection and analysis of 30 DefinedTerm
s
Expanding on the format of DefinedTerm
and DefinedTermSet
, I would suggest a few modifications to the above example:
@id
to DefinedTermSet
that points to an ontology (usually an OWL file)DefinedTerm
so that consumers can have the term definition without doing other lookupsThings to note or possible issues:
termCode
shoud not have the prefix of the ontology (so 0005515 for GO and topic_0080 for EDAM)url
in DefinedTerm
and DefinedTermSet
don't have a well defined scope/use (or I'm missing one) and are mostly copies of @id
sExamples:
{
"@type": "DefinedTerm",
"@id": "http://purl.bioontology.org/ontology/NCBITAXON/9606",
"termCode": "9606",
"name": "Homo sapiens",
"url": "http://purl.bioontology.org/ontology/NCBITAXON/9606",
"inDefinedTermSet": {
"@type": "DefinedTermSet",
"@id": "http://purl.obolibrary.org/obo/ncbitaxon.owl",
"name": "NCBI taxon",
"url": "https://bioportal.bioontology.org/ontologies/NCBITAXON"
},
"sameAs": [
"http://purl.uniprot.org/taxonomy/9606",
"https://identifiers.org/taxonomy:9606",
"http://purl.obolibrary.org/obo/NCBITaxon_9606"
]
}
{
"@type": "DefinedTerm",
"@id": "http://amigo.geneontology.org/amigo/term/GO:0005515",
"termCode": "0005515",
"name": "protein binding",
"url": "http://amigo.geneontology.org/amigo/term/GO:0005515",
"inDefinedTermSet": {
"@type": "DefinedTermSet",
"@id": "http://purl.obolibrary.org/obo/go.owl",
"name": "Gene Ontology",
"url": "https://bioportal.bioontology.org/ontologies/GO"
},
"sameAs": [
"http://purl.obolibrary.org/obo/GO_0005515",
"http://purl.bioontology.org/ontology/GO/GO:0005515",
"https://identifiers.org/GO:0005515",
"https://www.ebi.ac.uk/ols4/ontologies/go/terms?obo_id=GO:0005515",
"https://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0005515"
]
}
{
"@type": "DefinedTerm",
"@id": "http://purl.bioontology.org/ontology/EDAM/topic_0080",
"termCode": "topic_0080",
"name": "Sequence analysis",
"url": "http://purl.bioontology.org/ontology/EDAM/topic_0080",
"inDefinedTermSet": {
"@type": "DefinedTermSet",
"@id": "http://edamontology.org",
"name": "EDAM ontology",
"url": "https://bioportal.bioontology.org/ontologies/EDAM"
},
"sameAs": [
"https://identifiers.org/edam:topic_0080",
"https://www.ebi.ac.uk/ols/ontologies/edam/terms?short_form=topic_0080"
]
}
{
"@type": "DefinedTerm",
"@id": "http://purl.bioontology.org/ontology/ECO/ECO:0005670",
"termCode": "0005670",
"name": "x-ray crystallography evidence used in manual assertion",
"url": "http://purl.bioontology.org/ontology/ECO/ECO:0005670",
"inDefinedTermSet": {
"@type": "DefinedTermSet",
"@id": "http://purl.obolibrary.org/obo/eco.owl",
"name": "Evidence & Conclusion Ontology (ECO)",
"url": "https://bioportal.bioontology.org/ontologies/ECO"
},
"sameAs": [
"http://purl.obolibrary.org/obo/ECO_0005670",
"https://identifiers.org/ECO:0005670",
"https://www.ebi.ac.uk/ols4/ontologies/eco/terms?obo_id=ECO:0005670"
]
}
Hi @ivanmicetic , I am unsure about "termCode": "0005670"
without prefix.
The termCode alone is used nowhere in the owl:
<owl:Class rdf:about="http://purl.obolibrary.org/obo/ECO_0005670">
<oboInOwl:id rdf:datatype="http://www.w3.org/2001/XMLSchema#string">ECO:0005670</oboInOwl:id>
<owl:annotatedSource rdf:resource="http://purl.obolibrary.org/obo/ECO_0005670"/>
While there is heavy confusion about ECO_0005670
and ECO:0005670, the above example does not mention what (here: ECO
) goes before the :
or _
. Unfortunately, there is no termPrefix
property in the DefinedTermSet :-(
Without prefix, it is difficult to impossible to build any URLs or parameters to some established services like OLS.
The DefinedTerm documentation says "termCode: A code that identifies this DefinedTerm within a DefinedTermSet.", which is indeed a bit vague ...
If we'd stick with "termCode should not have the prefix of the ontology", I'd love to have a pointer to a resource that recommends this. Maybe that could be the definition from identifiers.org ? Does that work for all ontologies we have in OBO and bioportal ?
Yours, Steffen
These things can never be made straight and we always have to live with them. In the KnetMiner project, we treat IDs like ECO_0005670 as accessions, usually attaching the source (GO, ECO, ENSEMBL, etc), and associating an item to the multiple accessions and accession variants it might have (ECO_XXX, ECO:XXX, etc).
termCode
is a good property to represent such accessions, including the prefix, whatever separator is used for it, so, I usually do termCode = ECO_0005670
or termCode = ECO:0005670
, usually depending on the data I import.
We rarely need to extract the 'term code' in the sense of the numerical part. To me, it doesn't mean much, apart from rare and peculiar use cases.
One case where we consider the composition is when we try to merge entities with the same or very similar accessions, eg, if one term has ECO0005670 as accession and another ECO:0005670, then they're very likely the same, and this can be detected with a merge/normalisation tool, using a regex like `/[a-z]+[:\b-]?[0-9]+/i`.
Apart from that case, We never consider the numerical part alone and I've never felt the need to store it in cleaned/published data. There might be use cases where you actually want it, but adopting the idea that termCode
isn't just for numerical codes, you can do: x termCode 'ECO:0005670', 'ECO_0005670', '0005670'
.
@sneumann
If we'd stick with "termCode should not have the prefix of the ontology", I'd love to have a pointer to a resource that recommends this. Maybe that could be the definition from identifiers.org ?
The only place where termCode is separated and possibly defined from termPrefix is identifiers.org: |
resource | Local Unique Identifier (LUI) pattern | Prefix embedded in LUI |
---|---|---|---|
EDAM | ^(data|topic|operation|format)_\d{4}$ | No | |
NCBI taxonomy | ^\d+$ | No | |
ECO | ^ECO:\d{7}$ | Yes | |
GO | ^GO:\d{7}$ | Yes |
Note that this applies to compact identifiers or sample URLs for identifiers.org identifiers and not elswhere since ECO itself uses both :
and _
. GO behaves more coherently and uses only :
.
Here you can find a spreadsheet with the summary of proposed solutions/recommendations for DefinedTerm discussed in this issue. We could use it to see the most favored solution and to monitor the evolution/progress of this new profile (if you find it useful).
Regards, Ivan
Hi,
my initial urge was to say "duh, the local identifiers without prefix are useless, since there is no context and I wouldn't know how to use 'em then", similar to @marco-brandizi comment above. Hence, I really had hoped we'd find a way to document this to be a https://en.wikipedia.org/wiki/CURIE. There is some recommendations in the documentation of the curies package: https://curies.readthedocs.io/en/latest/ We could recommend to use the CURIE as it comes in Bioregistries https://bioregistry.io/registry/chmo .
Yours, Steffen
Hi again, as part of the discussion I hacked a jq script to reshape the response from the OLS based terminology service to return a DefinedTerm
. The mapping would be
{
"@type": "DefinedTerm",
"@id": ._embedded.terms[0].iri,
"termCode": ._embedded.terms[0].obo_id,
"name": ._embedded.terms[0].label,
"url": ("https://terminology.nfdi4chem.de/ts/ontologies/"+._embedded.terms[0].ontology_name+"/terms?iri="+._embedded.terms[0].iri),
"inDefinedTermSet": {
"@type": "DefinedTermSet",
"@id": ._embedded.terms[0].ontology_iri,
"name": ._embedded.terms[0].ontology_name,
}
}
or the equivalent command line:
wget -q -O- https://service.tib.eu/ts4tib/api/ontologies/chmo/terms?CURIE=CHMO:0000470 |\
jq '{ "@type": "DefinedTerm", "@id": ._embedded.terms[0].iri, "termCode": ._embedded.terms[0].obo_id, "name": ._embedded.terms[0].label, "url": ("https://terminology.nfdi4chem.de/ts/ontologies/"+._embedded.terms[0].ontology_name+"/terms?iri="+._embedded.terms[0].iri), "inDefinedTermSet": { "@type": "DefinedTermSet", "@id": ._embedded.terms[0].ontology_iri, "name": ._embedded.terms[0].ontology_name, } } '
resulting in
{
"@type": "DefinedTerm",
"@id": "http://purl.obolibrary.org/obo/CHMO_0001921",
"termCode": "CHMO:0001921",
"name": "fluorescence anisotropy decay curve",
"url": "https://terminology.nfdi4chem.de/ts/ontologies/chmo/terms?iri=http://purl.obolibrary.org/obo/CHMO_0001921",
"inDefinedTermSet": {
"@type": "DefinedTermSet",
"@id": "http://purl.obolibrary.org/obo/chmo.owl",
"name": "chmo"
}
}
And yes, this comment is also I know where to put these code snippets to find later :-)
The information about the DefinedTermSet
above is a bit poor, and would require a second call to obtain a bit more FAIR information.
Yours,
Steffen
@sneumann, I agree that the term code without prefix is quite useless and would favour the use of CURIEs. I made a quick look at the curies package and I like how they solved the standardization of CURIEs in order to use multiple synonym prefixes as well as URI prefix synonyms:
from curies import Converter, Record
converter = Converter([
Record(
prefix="GO",
prefix_synonyms=["gomf", "gocc", "gobp", "go", ...],
uri_prefix="http://purl.obolibrary.org/obo/GO_",
uri_prefix_synonyms=[
"http://amigo.geneontology.org/amigo/term/GO:",
"https://identifiers.org/GO:",
...
],
),
# And so on
...
])
>>> converter.standardize_prefix("gomf")
'GO'
>>> converter.standardize_curie('gomf:0032571')
'GO:0032571'
>>> converter.standardize_uri('http://amigo.geneontology.org/amigo/term/GO:0032571')
'http://purl.obolibrary.org/obo/GO_0032571'
Maybe we could translate this concept to DefinedTerm
, or at least force the use of standardized CURIEs (and standardized IRIs) in our profiles?
Hi, several people are representing terms from ontologies via DefinedTerm (example ), but I guess there are different flavors out there how exactly to do that. Hence I would like to call for 1) better documentation, e.g. on our
Getting Started
tab, and/or 2) even a profile for an ontology-backedDefinedTerm
. The main rationale is that I see validators and harvesters starting to connect to terminology services, so we should make it easy for them to recognise and follow ontology terms.So, starting towards better documentation, can we come up with examples and promises how to represent a
DefinedTerm
?I am most concerned about our recommendations for
@id
,identifier
,url
,termCode
, all of which somehow identify/lead to the ontology term.Similarly, we might want recommendations for the DefinedTermSet. Above we have:
Is that enough as minimum information ? Very often we have
@context
,@id
and for profilesdct:conformsTo
as marginality minimum. How do we tell validators that there is an external controlled vocabulary/ontology behind a term, and not just a flat list of hasDefinedTerm in the set ? Do we specify the ontology lookup services asurl
? Anything else we'd need forDefinedTermSet
?Yours, Steffen