EBISPOT / efo

Github repo for the Experimental Factor Ontology (EFO)
https://www.ebi.ac.uk/efo/
56 stars 13 forks source link

Missing descriptions in EFO diseases #228

Open gkos-bio opened 6 years ago

gkos-bio commented 6 years ago

Example: https://www.ebi.ac.uk/ols/ontologies/ordo/terms?iri=http%3A%2F%2Fwww.orpha.net%2FORDO%2FOrphanet_2295

is in EFO but has no description

https://www.ebi.ac.uk/ols/ontologies/efo/terms?iri=http%3A%2F%2Fwww.orpha.net%2FORDO%2FOrphanet_2295

It looks like it's an automated process that is somehow broken for ORDO diseases.

zoependlington commented 6 years ago

@gkos-bio We have a ticket right now where we're looking at all ORDO terms in EFO (https://github.com/EBISPOT/efo/issues/225) as some of them are obsolete in ORDO but still existing in EFO. I'll make sure we check the descriptions of the existing ORDO terms in EFO alongside this. Thanks for bringing it to attention!

paolaroncaglia commented 6 years ago

The originating issue has been fixed in EFO 3:

http://35.241.144.112/ontologies/efo/terms?iri=http%3A%2F%2Fwww.orpha.net%2FORDO%2FOrphanet_2295

But #225 has not been closed yet, so I'll leave this one open too.

paolaroncaglia commented 5 years ago

Note for EFO editors: @zoependlington kindly offered to run a SPARQL query to search for any term missing descriptions in EFO (i.e not only ORDO terms).

zoependlington commented 5 years ago

We have 8510 terms that are missing definitions: https://docs.google.com/spreadsheets/d/14KJl8hG7CswSaHD5okZyAbL8q33y2aHl14L4hzA--Z8/edit?usp=sharing

The second tab is all disease terms that are missing definitions (4453) and the third tab is phenotypes missing definitions (198)

paolaroncaglia commented 5 years ago

Notes for EFO editors: some potentially useful strategies/considerations follow:

paolaroncaglia commented 5 years ago

We'd need to re-run @zoependlington 's query above in EFO3 to gauge current size and priority of this ticket.

zoependlington commented 5 years ago

@paolaroncaglia There are 7066 terms missing definitions in EFO3. 56 are phenotype terms and 2380 are disease terms.

New spreadsheet here: https://docs.google.com/spreadsheets/d/14KQoBGMh5V4LsT9QRLpmGFMX52pFUdZa2xEzNs50nRc/edit?usp=sharing

paolaroncaglia commented 5 years ago

@zoependlington Thanks for making a spreadsheet of terms lacking definition in EFO3. Here's a more detailed summary:

Re. phenotypes: I made a tab where I sorted them by ontology ("Phenotypes sorted by ontology"); 35 are HP terms that might be fixed when we do a fresh import of HPO (#437) (but would need to be double-checked after the import); 7 are EFO terms, so we'd need to create definitions for those ourselves; 14 are ORDO terms, so we'd need to check if they have definitions in ORDO now, and create them ourselves if not (assuming we continue to prefer not to do a fresh, batch import of all of ORDO).

Re. diseases: I made a tab where I sorted them by ontology ("Diseases sorted by ontology"); 3 are HP terms that might be fixed when we do a fresh import of HPO (#437) (but would need to be double-checked after the import); 505 are MONDO terms, so we should consult with MONDO on a strategy; 15 are EFO terms, so we'd need to create definitions for those ourselves; 1857 are ORDO terms, so we'd need to check if they have definitions in ORDO now, and create them ourselves if not (assuming we continue to prefer not to do a fresh, batch import of all of ORDO). In the latter case, the size of the task would need to be considered.

Re. other types of terms: I made a tab where I sorted all terms by ontology ("All sorted by ontology"); 9 are country names; 21 are BTO terms; 492 are CHEBI terms; 499 are CL terms; 2 are EO terms; 78 are FBbt terms; 8 are FMA terms; 35 are GO terms (see #444); 16 are HANCESTRO terms; 39 are HP terms; 528 are MONDO terms; 1448 are NCBITaxon terms; 3 are PATO terms; 5 are PO terms; 868 are UBERON terms; 7 are UO terms; 18 are ZFA terms; 1133 are EFO terms; 1857 are ORDO terms. We may want to come back to these at a later stage.

paolaroncaglia commented 5 years ago

@gkos-bio @afaulconbridge Gautier opened this ticket some time ago pointing to terms missing descriptions (definitions) in the EFO disease branch. We have now re-evaluated the issue in EFO3 and prepared a summary (see previous 2 comments). Could you please let us know if fixing the broader issue (i.e. creating or adding definitions wherever they are missing) is of interest for Open Targets, and if so what would be priority between diseases and phenotypes. Thanks!

paolaroncaglia commented 5 years ago

Also see https://github.com/EBISPOT/efo/issues/544

paolaroncaglia commented 4 years ago

Agreed strategy with Zoe: