EBISPOT / efo

Github repo for the Experimental Factor Ontology (EFO)
https://www.ebi.ac.uk/efo/
54 stars 14 forks source link

Synonym policy #276

Open paolaroncaglia opened 5 years ago

paolaroncaglia commented 5 years ago

Stemming from an email from Adam Faulconbridge:

“Quick question about synonym handling in EFO, and EFO 3 in particular. OpenTargets has noticed that in many cases there are a lot of synonyms that have very low information content. For example, for "heart disease" there is:

CARDIAC DIS Cardiac Disease Cardiac Diseases Disease, Cardiac Disease, Heart Diseases, Cardiac Diseases, Heart HEART DIS Heart Diseases cardiac disease disease of heart heart disease heart disease or disorder heart disorder heart trouble

Which could pretty much be reduced to:

heart disease cardiac disease heart disorder heart trouble

Is there any policy on handling of synonyms? Any tooling available for this?”

@simonjupp replied: “This is partly an artefact of us importing properties from multiple source and partly not having a good policy on this. Zoe and Paola are already looking into writing a policy on labels, we should have one for synonyms and look to improve EFO with respect to this."

As a start, I’m copying here some useful bits from the MONDO Editors guide (https://docs.google.com/document/d/19bp9MpCHCxbjMmbntB2e5gZNzzNlu06DnDB8xcoSXK8/edit# - thanks @nicolevasilevsky and @cmungall!)

“Class Metadata The standard OBO properties are used: ... Synonyms. Use broad/narrow/exact/related wisely. See: uberon synonyms guide. TODO: still to clean up a lot of synonym scopes seeded from external ontologies We tend to use BROAD/NARROW generously, even if the sub/super exists. This is because it is useful to annotate other ontologies usages of synonyms.

Synonyms Use lowercase, even for initial letter, except for proper names (note: many syns remain with leading capitalization, this is improving). Always annotate synonyms with xrefs. Many of these are currently DOID, Orphanet, GARD, etc IDs. We will add more directly referencing a publication (PMID CURIEs). Also add editor ID where appropriate (ORCID). Always indicate synonym scope. These are incorrect in many places where they have been brought in externally. Do not trust scope if there is no synonym xref other than DO. We follow a lot of the same rules as Uberon for text mining: https://github.com/obophenotype/uberon/wiki/Using-uberon-for-text-mining Some synonyms are annotated with EXCLUDE, e.g. “NOS” synonyms. It is useful to have these in the edit version, but these are filtered on release. We may also mark synons with DEPRECATED. E.g. all occurrences of “mental retardation” should be “intellectual disability” We try and avoid including things in this list: https://en.wikipedia.org/wiki/List_of_medical_eponyms_with_Nazi_associations but if it’s established (e.g. Wegener granulomatosis) may include as a syn and mark DEPRECATED”

paolaroncaglia commented 5 years ago

@simonjupp @zoependlington I'm not sure that this is high-priority enough to be addressed in the current sprint i.e. by next week. It will take some time to complete, and we probably need to discuss a global approach first - perhaps at a future EFO planning meeting? If that's the case, we should move this ticket away from the 'Next' queue.

paolaroncaglia commented 5 years ago

@simonjupp At the EFO meeting today, @zoependlington and I agreed that we need to discuss a strategy re. synonym policy with you, so we'll keep this ticket for discussion at the next EFO planning meeting. Thanks.

paolaroncaglia commented 5 years ago

At our EFO meeting today, we resolved to write a policy about synonyms that should be similar to the label policy (e.g., no uppercase unless necessary). We can refer to the MONDO editors' guide (https://docs.google.com/document/d/19bp9MpCHCxbjMmbntB2e5gZNzzNlu06DnDB8xcoSXK8/edit). Then we can fix synonyms as we go based on the policy and on applying some global strategy, similarly to https://github.com/EBISPOT/efo/issues/274.

paolaroncaglia commented 5 years ago

As a first pass at drafting a synonym policy, we'll start with synonyms of EFO terms that are currently tagged as therapeutic ares in Open Targets (#481 ).

paolaroncaglia commented 5 years ago

An example of synonym issues in a non-OT TA is "parasitic infection" (EFO:0001067), where several currently exact synonyms are indeed non-exact, e.g. Parasitic infection of lung.

paolaroncaglia commented 5 years ago

Making notes here of tricky cases: 1) "Exact" synonyms that are identical to labels: see https://github.com/EBISPOT/efo/issues/481#issuecomment-505824763 2) "Exact" synonyms with Bioportal provenance: see https://github.com/EBISPOT/efo/issues/481#issuecomment-505865478

paolaroncaglia commented 5 years ago

A quick note to remind ourselves that it may be important to review synonyms of EFO terms because

paolaroncaglia commented 4 years ago

Note for when we get to address the multiplicity of synonyms/synonym policy: If a Mondo term is imported in EFO and mapped to an existing EFO term, and the Mondo term comes with an annotation of "preferred label" (visible in Protege) that is identical to the EFO label, in OLS it will appear as the EFO term has a synonym that's the same as its label.

paolaroncaglia commented 2 years ago

@zoependlington as far as I know, this issue hasn't been addressed yet, or at least not fully.