OBOFoundry / OBOFoundry.github.io

Metadata and website for the Open Bio Ontologies Foundry Ontology Registry
http://obofoundry.org
Other
166 stars 205 forks source link

Principle #9 users - automated validation #1008

Open beckyjackson opened 5 years ago

beckyjackson commented 5 years ago

FP 9 - Documented Plurality of Users

Automated checks:

  1. Is there a valid issue tracker?
  2. Are there stated usages?

Mechanism:

We can pull the tracker value from the ontology YAML. We should ensure that this tracker resolves (does not return HTTP status > 400). It would be nice to check if there is activity on the tracker, but I'm not sure if that is possible at this time. I'm open to suggestions. If the ontology does not have a tracker, this check fails.

We can also look at the usages tag from the ontology YAML. If there are no documented usages, the ontology will get a warning. The usages should contain a user property with a valid URL. Perhaps if the URL does not resolve, we just return an info message.

We may need to standardize the usages tag. Currently, there are multiple ways that people have inserted usages. For example, ENVO contains two different examples of usages:

usages:
 - type: data-annotation
   description: "describing species habitats"
   examples:
     url: http://eol.org/pages/211700/data
   resources:
     url: http://eol.org
     label: EOL
usages:
  - user: http://oceans.taraexpeditions.org/en/
    description: Samples collected during Tara Oceans expedition are annotated with ENVO
    example:
      - url: https://www.ebi.ac.uk/metagenomics/projects/ERP001736/samples/ERS487899
        description: "Sample collected during the Tara Oceans expedition (2009-2013) at station TARA_004 (latitudeN=36.5533, longitudeE=-6.5669)"

I propose the following format for usages:

usages:
  - user: required URL
    type: optional text
    description: required text
    example:
      - url: required URL
        description: required text
nataled commented 5 years ago

From the EWG discussion on this:

Partial automation possible, especially with respect to use of its terms in other ontologies and citations.

Chris M commented: The curation of usages must be manual and closely vetted by OBO Foundry.

We have usages partially curated here:

https://github.com/OBOFoundry/OBOFoundry.github.io/issues/451

Once in place the checks themselves can be automated.

Also easy to check things like GH activity. While it's conceivable that some ontologies with multiple users don't use GH it is at least a meaningful signal

jamesaoverton commented 5 years ago

The principle says "Use of the target ontology’s term IRIs in other ontologies. This can be evidenced by linking to the other ontology that uses an ontology term IRI from this ontology" We could search for term use in other ontologies.

cmungall commented 5 years ago

I agree with @beckyjackson's proposed standardization of the usages tag.

I think querying for ontology usage also makes sense. It would be fun to do a more in-depth analysis to identify "citation rings" and other artefacts.

sbello commented 4 years ago

Could this check also look at the 'browser' section on the OBO foundry page (https://github.com/OBOFoundry/OBOFoundry.github.io/blob/master/ontology/mp.md) The MP entry lists the MGI, RGD, and Monarch browsers and I was wondering if that should/could contribute to the plurality of users check.

cmungall commented 4 years ago

We can also query eutils to look at number citations of publication(s)

We could also add this as links from the obo site, e.g. we track the uberon pmid as 22293552, can add a link to:

https://www.ncbi.nlm.nih.gov/pubmed?linkname=pubmed_pubmed_citedin&from_uid=22293552

Of course, many ontologies are under-cited, but it's a proxy

We can also do a google search for mentions of the ontology (but this can't be done via API AFAIK)

nataled commented 4 years ago

Also note that ontologies can be over-cited too. These are cases where the ontology was mentioned (usually as part of a "such as..." list) but not used or studied in any way. This is similar to what happens in OntoBee when it shows term usage in other ontologies, the vast majority of which are due to some wholesale import of the ontology (but the term in question was never used).

cmungall commented 4 years ago

Very good point @nataled! Dare I say it a lot of this over-citation may come from papers about ontologies...

cmungall commented 4 years ago

There are no objections to the schema @beckyjackson proposes

I would add: make examples mandatory, but multivalued. ie cardinality >= 1.

jamesaoverton commented 3 years ago

@apmody and I are working on this in #1371. The proposed schema above is a little too simple. People are making good use of seeAlso to point to Biosharing/FAIRSharing, and of reference to link to publications about the usage. So we're going to try this schema:

usages:
  - user: required URL
    type: optional text (how the ontology is used, e.g. annotation)
    description: required text
    seeAlso: optional URL (e.g. FAIRSharing entry)
    examples:
      - url: required URL
        description: required text
    publications:
      - id: required URL (DOI, PubMed, etc.)
         title: required text
matentzn commented 3 years ago

I like it!