Open gkos-bio opened 6 years ago
@gkos-bio We have a ticket right now where we're looking at all ORDO terms in EFO (https://github.com/EBISPOT/efo/issues/225) as some of them are obsolete in ORDO but still existing in EFO. I'll make sure we check the descriptions of the existing ORDO terms in EFO alongside this. Thanks for bringing it to attention!
The originating issue has been fixed in EFO 3:
http://35.241.144.112/ontologies/efo/terms?iri=http%3A%2F%2Fwww.orpha.net%2FORDO%2FOrphanet_2295
But #225 has not been closed yet, so I'll leave this one open too.
Note for EFO editors: @zoependlington kindly offered to run a SPARQL query to search for any term missing descriptions in EFO (i.e not only ORDO terms).
We have 8510 terms that are missing definitions: https://docs.google.com/spreadsheets/d/14KJl8hG7CswSaHD5okZyAbL8q33y2aHl14L4hzA--Z8/edit?usp=sharing
The second tab is all disease terms that are missing definitions (4453) and the third tab is phenotypes missing definitions (198)
Notes for EFO editors: some potentially useful strategies/considerations follow:
[x] Check in EFO 3 to see if there are fewer terms missing definitions
[ ] @zoependlington noted that the disease terms may be top priority especially for Open Targets; I expect that phenotypes would come second; Adam suggests focusing on phenotypes when we do get to this ticket
[ ] If term in EFO still has exact mapping to originating ontology, query originating ontology for definition as it may have been added since EFO imported the term
[ ] I'll write a summary below to show number of classes per ontology (rather than topic) and some quick comments (e.g., NCBITaxon classes don't have definitions at all, so those are ok as are in EFO)
We'd need to re-run @zoependlington 's query above in EFO3 to gauge current size and priority of this ticket.
@paolaroncaglia There are 7066 terms missing definitions in EFO3. 56 are phenotype terms and 2380 are disease terms.
New spreadsheet here: https://docs.google.com/spreadsheets/d/14KQoBGMh5V4LsT9QRLpmGFMX52pFUdZa2xEzNs50nRc/edit?usp=sharing
@zoependlington Thanks for making a spreadsheet of terms lacking definition in EFO3. Here's a more detailed summary:
Re. phenotypes: I made a tab where I sorted them by ontology ("Phenotypes sorted by ontology"); 35 are HP terms that might be fixed when we do a fresh import of HPO (#437) (but would need to be double-checked after the import); 7 are EFO terms, so we'd need to create definitions for those ourselves; 14 are ORDO terms, so we'd need to check if they have definitions in ORDO now, and create them ourselves if not (assuming we continue to prefer not to do a fresh, batch import of all of ORDO).
Re. diseases: I made a tab where I sorted them by ontology ("Diseases sorted by ontology"); 3 are HP terms that might be fixed when we do a fresh import of HPO (#437) (but would need to be double-checked after the import); 505 are MONDO terms, so we should consult with MONDO on a strategy; 15 are EFO terms, so we'd need to create definitions for those ourselves; 1857 are ORDO terms, so we'd need to check if they have definitions in ORDO now, and create them ourselves if not (assuming we continue to prefer not to do a fresh, batch import of all of ORDO). In the latter case, the size of the task would need to be considered.
Re. other types of terms: I made a tab where I sorted all terms by ontology ("All sorted by ontology"); 9 are country names; 21 are BTO terms; 492 are CHEBI terms; 499 are CL terms; 2 are EO terms; 78 are FBbt terms; 8 are FMA terms; 35 are GO terms (see #444); 16 are HANCESTRO terms; 39 are HP terms; 528 are MONDO terms; 1448 are NCBITaxon terms; 3 are PATO terms; 5 are PO terms; 868 are UBERON terms; 7 are UO terms; 18 are ZFA terms; 1133 are EFO terms; 1857 are ORDO terms. We may want to come back to these at a later stage.
@gkos-bio @afaulconbridge Gautier opened this ticket some time ago pointing to terms missing descriptions (definitions) in the EFO disease branch. We have now re-evaluated the issue in EFO3 and prepared a summary (see previous 2 comments). Could you please let us know if fixing the broader issue (i.e. creating or adding definitions wherever they are missing) is of interest for Open Targets, and if so what would be priority between diseases and phenotypes. Thanks!
Agreed strategy with Zoe:
Example: https://www.ebi.ac.uk/ols/ontologies/ordo/terms?iri=http%3A%2F%2Fwww.orpha.net%2FORDO%2FOrphanet_2295
is in EFO but has no description
https://www.ebi.ac.uk/ols/ontologies/efo/terms?iri=http%3A%2F%2Fwww.orpha.net%2FORDO%2FOrphanet_2295
It looks like it's an automated process that is somehow broken for ORDO diseases.