DiseaseOntology / HumanDiseaseOntology

Repository for the Human Disease Ontology.
Creative Commons Zero v1.0 Universal
336 stars 109 forks source link

Implement automated removal of EFO xrefs for obsolete terms #1286

Closed allenbaron closed 8 months ago

allenbaron commented 8 months ago

Prompted by @sbello's comments in issue #1285, I created a new make rule update_efo that automatically removes xrefs to obsoleted EFO classes. It relies on Ontobee's SPARQL endpoint and a custom SPARQL query to identify the obsolete EFO classes (I could not find an EFO-specific SPARQL endpoint).

This approach does not currently take into account the reason for obsolescence or provide opportunity to replace on term with another. It does create a robot diff to show what is changed.

The following xrefs were removed when I executed this (and the changes are included in this PR; robot diff output):

20 axioms in left ontology but not in right ontology:
- AnnotationAssertion(<oboInOwl:hasDbXref>[database_cross_reference] <DOID:0050741>[alcohol dependence] "EFO:0003829")
- AnnotationAssertion(<oboInOwl:hasDbXref>[database_cross_reference] <DOID:10652>[Alzheimer's disease] "EFO:0000249")
- AnnotationAssertion(<oboInOwl:hasDbXref>[database_cross_reference] <DOID:11555>[Fuchs' endothelial dystrophy] "EFO:0003946")
- AnnotationAssertion(<oboInOwl:hasDbXref>[database_cross_reference] <DOID:11782>[astigmatism] "EFO:0004222")
- AnnotationAssertion(<oboInOwl:hasDbXref>[database_cross_reference] <DOID:11830>[myopia] "EFO:0003927")
- AnnotationAssertion(<oboInOwl:hasDbXref>[database_cross_reference] <DOID:14330>[Parkinson's disease] "EFO:0002508")
- AnnotationAssertion(<oboInOwl:hasDbXref>[database_cross_reference] <DOID:1686>[glaucoma] "EFO:0000516")
- AnnotationAssertion(<oboInOwl:hasDbXref>[database_cross_reference] <DOID:2377>[multiple sclerosis] "EFO:0003885")
- AnnotationAssertion(<oboInOwl:hasDbXref>[database_cross_reference] <DOID:2841>[asthma] "EFO:0000270")
- AnnotationAssertion(<oboInOwl:hasDbXref>[database_cross_reference] <DOID:3312>[bipolar disorder] "EFO:0000289")
- AnnotationAssertion(<oboInOwl:hasDbXref>[database_cross_reference] <DOID:332>[amyotrophic lateral sclerosis] "EFO:0000253")
- AnnotationAssertion(<oboInOwl:hasDbXref>[database_cross_reference] <DOID:4007>[bladder carcinoma] "EFO:0000292")
- AnnotationAssertion(<oboInOwl:hasDbXref>[database_cross_reference] <DOID:5419>[schizophrenia] "EFO:0000692")
- AnnotationAssertion(<oboInOwl:hasDbXref>[database_cross_reference] <DOID:6364>[migraine] "EFO:0003821")
- AnnotationAssertion(<oboInOwl:hasDbXref>[database_cross_reference] <DOID:8398>[osteoarthritis] "EFO:0002506")
- AnnotationAssertion(<oboInOwl:hasDbXref>[database_cross_reference] <DOID:8986>[narcolepsy] "EFO:0000614")
- AnnotationAssertion(<oboInOwl:hasDbXref>[database_cross_reference] <DOID:9074>[systemic lupus erythematosus] "EFO:0002690")
- AnnotationAssertion(<oboInOwl:hasDbXref>[database_cross_reference] <DOID:9352>[type 2 diabetes mellitus] "EFO:0001360")
- AnnotationAssertion(<oboInOwl:hasDbXref>[database_cross_reference] <DOID:9744>[type 1 diabetes mellitus] "EFO:0001359")
- AnnotationAssertion(<oboInOwl:hasDbXref>[database_cross_reference] <DOID:9835>[refractive error] "EFO:0003908")

0 axioms in right ontology but not in left ontology:

EFO does not use term replaced by and instead uses efo1:reason_for_obsolescence which is a text string sometimes identifying the term it is replaced by but it's not consistent enough to implement any automated check for replacement terms.

I reviewed the reason for obsolescence for each of the removed EFO terms after running this with the following SPARQL query (run on DO-KB SPARQL sandbox):

PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX oboInOwl: <http://www.geneontology.org/formats/oboInOwl#>
PREFIX efo1: <http://www.ebi.ac.uk/efo/>

SELECT ?doid ?efo_id ?dep_reason
WHERE {
    SERVICE <http://sparql.hegroup.org/sparql/> {
        GRAPH <http://www.ebi.ac.uk/efo/> {
            ?efo a owl:Class ;
                owl:deprecated ?any .
            OPTIONAL { ?efo efo1:reason_for_obsolescence ?dep_reason . }
            BIND( CONCAT( "EFO:", STRAFTER( str(?efo), "EFO_" ) ) AS ?efo_id )
        }
    }

    ?class oboInOwl:id ?doid ;
        oboInOwl:hasDbXref ?efo_id .
    FILTER NOT EXISTS { ?class owl:deprecated ?any . }
}

All of these terms were replaced by terms in other ontologies, usually Mondo or HP (and in at least one case ORDO). Sometimes the exact term URI is included in the reason for obsolescence, sometimes only the term name is included without even identifying which ontology the duplicate term is in.

@lschriml, I think it's fairly safe to use an automated approach to remove EFO xrefs, since we are not actively adding EFO diseases and EFO is actively replacing their diseases with HP and Mondo terms. Can we merge this?

If you prefer, I can alter this to instead create a curation queue including EFOs reason for obsolescence. The only downside would be the need to remove undesirable xrefs by hand in Protege.