Prompted by @sbello's comments in issue #1285, I created a new make rule update_efo that automatically removes xrefs to obsoleted EFO classes. It relies on Ontobee's SPARQL endpoint and a custom SPARQL query to identify the obsolete EFO classes (I could not find an EFO-specific SPARQL endpoint).
This approach does not currently take into account the reason for obsolescence or provide opportunity to replace on term with another. It does create a robot diff to show what is changed.
The following xrefs were removed when I executed this (and the changes are included in this PR; robot diff output):
20 axioms in left ontology but not in right ontology:
- AnnotationAssertion(<oboInOwl:hasDbXref>[database_cross_reference] <DOID:0050741>[alcohol dependence] "EFO:0003829")
- AnnotationAssertion(<oboInOwl:hasDbXref>[database_cross_reference] <DOID:10652>[Alzheimer's disease] "EFO:0000249")
- AnnotationAssertion(<oboInOwl:hasDbXref>[database_cross_reference] <DOID:11555>[Fuchs' endothelial dystrophy] "EFO:0003946")
- AnnotationAssertion(<oboInOwl:hasDbXref>[database_cross_reference] <DOID:11782>[astigmatism] "EFO:0004222")
- AnnotationAssertion(<oboInOwl:hasDbXref>[database_cross_reference] <DOID:11830>[myopia] "EFO:0003927")
- AnnotationAssertion(<oboInOwl:hasDbXref>[database_cross_reference] <DOID:14330>[Parkinson's disease] "EFO:0002508")
- AnnotationAssertion(<oboInOwl:hasDbXref>[database_cross_reference] <DOID:1686>[glaucoma] "EFO:0000516")
- AnnotationAssertion(<oboInOwl:hasDbXref>[database_cross_reference] <DOID:2377>[multiple sclerosis] "EFO:0003885")
- AnnotationAssertion(<oboInOwl:hasDbXref>[database_cross_reference] <DOID:2841>[asthma] "EFO:0000270")
- AnnotationAssertion(<oboInOwl:hasDbXref>[database_cross_reference] <DOID:3312>[bipolar disorder] "EFO:0000289")
- AnnotationAssertion(<oboInOwl:hasDbXref>[database_cross_reference] <DOID:332>[amyotrophic lateral sclerosis] "EFO:0000253")
- AnnotationAssertion(<oboInOwl:hasDbXref>[database_cross_reference] <DOID:4007>[bladder carcinoma] "EFO:0000292")
- AnnotationAssertion(<oboInOwl:hasDbXref>[database_cross_reference] <DOID:5419>[schizophrenia] "EFO:0000692")
- AnnotationAssertion(<oboInOwl:hasDbXref>[database_cross_reference] <DOID:6364>[migraine] "EFO:0003821")
- AnnotationAssertion(<oboInOwl:hasDbXref>[database_cross_reference] <DOID:8398>[osteoarthritis] "EFO:0002506")
- AnnotationAssertion(<oboInOwl:hasDbXref>[database_cross_reference] <DOID:8986>[narcolepsy] "EFO:0000614")
- AnnotationAssertion(<oboInOwl:hasDbXref>[database_cross_reference] <DOID:9074>[systemic lupus erythematosus] "EFO:0002690")
- AnnotationAssertion(<oboInOwl:hasDbXref>[database_cross_reference] <DOID:9352>[type 2 diabetes mellitus] "EFO:0001360")
- AnnotationAssertion(<oboInOwl:hasDbXref>[database_cross_reference] <DOID:9744>[type 1 diabetes mellitus] "EFO:0001359")
- AnnotationAssertion(<oboInOwl:hasDbXref>[database_cross_reference] <DOID:9835>[refractive error] "EFO:0003908")
0 axioms in right ontology but not in left ontology:
EFO does not use term replaced by and instead uses efo1:reason_for_obsolescence which is a text string sometimes identifying the term it is replaced by but it's not consistent enough to implement any automated check for replacement terms.
I reviewed the reason for obsolescence for each of the removed EFO terms after running this with the following SPARQL query (run on DO-KB SPARQL sandbox):
All of these terms were replaced by terms in other ontologies, usually Mondo or HP (and in at least one case ORDO). Sometimes the exact term URI is included in the reason for obsolescence, sometimes only the term name is included without even identifying which ontology the duplicate term is in.
@lschriml, I think it's fairly safe to use an automated approach to remove EFO xrefs, since we are not actively adding EFO diseases and EFO is actively replacing their diseases with HP and Mondo terms. Can we merge this?
If you prefer, I can alter this to instead create a curation queue including EFOs reason for obsolescence. The only downside would be the need to remove undesirable xrefs by hand in Protege.
Prompted by @sbello's comments in issue #1285, I created a new make rule
update_efo
that automatically removes xrefs to obsoleted EFO classes. It relies on Ontobee's SPARQL endpoint and a custom SPARQL query to identify the obsolete EFO classes (I could not find an EFO-specific SPARQL endpoint).This approach does not currently take into account the reason for obsolescence or provide opportunity to replace on term with another. It does create a robot diff to show what is changed.
The following xrefs were removed when I executed this (and the changes are included in this PR; robot diff output):
EFO does not use
term replaced by
and instead usesefo1:reason_for_obsolescence
which is a text string sometimes identifying the term it is replaced by but it's not consistent enough to implement any automated check for replacement terms.I reviewed the reason for obsolescence for each of the removed EFO terms after running this with the following SPARQL query (run on DO-KB SPARQL sandbox):
All of these terms were replaced by terms in other ontologies, usually Mondo or HP (and in at least one case ORDO). Sometimes the exact term URI is included in the reason for obsolescence, sometimes only the term name is included without even identifying which ontology the duplicate term is in.
@lschriml, I think it's fairly safe to use an automated approach to remove EFO xrefs, since we are not actively adding EFO diseases and EFO is actively replacing their diseases with HP and Mondo terms. Can we merge this?
If you prefer, I can alter this to instead create a curation queue including EFOs reason for obsolescence. The only downside would be the need to remove undesirable xrefs by hand in Protege.