Closed d0choa closed 3 years ago
Great extraction + curation effort! Some thoughts to optimise incorporation of results, all focused on the first list (suggested synonyms):
Regarding the first point, it's quite common for dictionaries to contain ambiguous synonyms. For example, we deal with gene/protein dictionaries in which some synonyms (e.g. p55
) can refer to multiple instances. That's not a problem of the ontology/dictionary, which captures a reality on the semantic space.
Not including ambiguous synonyms will prevent users to ground labels to EFO. Our take is that the problem of de-ambiguating synonyms or acronyms should be a downstream process of specific applications not a feature of the ontology. This is an already widespread issue in EFO and other dictionaries so we are setting up systems to handle them. More info.
On a separate note, separating acronyms from synonyms would be highly beneficial for us as well. I know this has been mentioned in the past, so I'm just reiterating that we will have an application for it.
I have added all EFO/Orphanet acronyms as synonyms for now until we find a better way to represent these separately.
I have also imported all HP and Mondo terms in the second table into EFO. However, currently we are not able to import NCIT and OMIT, so if these and any without a suggested term are required we will need a new term request for these. I can set up a template for a bulk request of these if needed.
As part of an Open Targets project in collaboration with EPMC, we have processed the entire corpus to detect diseases or phenotypes using NER. In a subsequent step, we ground each label to its corresponding term in EFO using all available names and synonyms.
This process has identified several gaps in EFO. Labels that are frequently used in the literature, but are not present in EFO either in the form of terms or as synonyms.
I have curated the first set of highly frequent labels for your consideration and divided them into the next 2 groups:
Happy to provide more detail about our findings.