Open beckyjackson opened 5 years ago
From the EWG discussion on this:
Partial automation possible, especially with respect to use of its terms in other ontologies and citations.
Chris M commented: The curation of usages must be manual and closely vetted by OBO Foundry.
We have usages partially curated here:
https://github.com/OBOFoundry/OBOFoundry.github.io/issues/451
Once in place the checks themselves can be automated.
Also easy to check things like GH activity. While it's conceivable that some ontologies with multiple users don't use GH it is at least a meaningful signal
The principle says "Use of the target ontology’s term IRIs in other ontologies. This can be evidenced by linking to the other ontology that uses an ontology term IRI from this ontology" We could search for term use in other ontologies.
I agree with @beckyjackson's proposed standardization of the usages tag.
I think querying for ontology usage also makes sense. It would be fun to do a more in-depth analysis to identify "citation rings" and other artefacts.
Could this check also look at the 'browser' section on the OBO foundry page (https://github.com/OBOFoundry/OBOFoundry.github.io/blob/master/ontology/mp.md) The MP entry lists the MGI, RGD, and Monarch browsers and I was wondering if that should/could contribute to the plurality of users check.
We can also query eutils to look at number citations of publication(s)
We could also add this as links from the obo site, e.g. we track the uberon pmid as 22293552, can add a link to:
https://www.ncbi.nlm.nih.gov/pubmed?linkname=pubmed_pubmed_citedin&from_uid=22293552
Of course, many ontologies are under-cited, but it's a proxy
We can also do a google search for mentions of the ontology (but this can't be done via API AFAIK)
Also note that ontologies can be over-cited too. These are cases where the ontology was mentioned (usually as part of a "such as..." list) but not used or studied in any way. This is similar to what happens in OntoBee when it shows term usage in other ontologies, the vast majority of which are due to some wholesale import of the ontology (but the term in question was never used).
Very good point @nataled! Dare I say it a lot of this over-citation may come from papers about ontologies...
There are no objections to the schema @beckyjackson proposes
I would add: make examples mandatory, but multivalued. ie cardinality >= 1.
@apmody and I are working on this in #1371. The proposed schema above is a little too simple. People are making good use of seeAlso
to point to Biosharing/FAIRSharing, and of reference
to link to publications about the usage. So we're going to try this schema:
usages:
- user: required URL
type: optional text (how the ontology is used, e.g. annotation)
description: required text
seeAlso: optional URL (e.g. FAIRSharing entry)
examples:
- url: required URL
description: required text
publications:
- id: required URL (DOI, PubMed, etc.)
title: required text
I like it!
FP 9 - Documented Plurality of Users
Automated checks:
Mechanism:
We can pull the
tracker
value from the ontology YAML. We should ensure that this tracker resolves (does not return HTTP status > 400). It would be nice to check if there is activity on the tracker, but I'm not sure if that is possible at this time. I'm open to suggestions. If the ontology does not have a tracker, this check fails.We can also look at the
usages
tag from the ontology YAML. If there are no documented usages, the ontology will get a warning. Theusages
should contain auser
property with a valid URL. Perhaps if the URL does not resolve, we just return an info message.We may need to standardize the
usages
tag. Currently, there are multiple ways that people have inserted usages. For example, ENVO contains two different examples of usages:I propose the following format for usages: