EBISPOT / efo

Github repo for the Experimental Factor Ontology (EFO)
https://www.ebi.ac.uk/efo/
54 stars 14 forks source link

Document the difference between the 4 released OWL files #926

Open dhimmel opened 3 years ago

dhimmel commented 3 years ago

EFO Release v3.25.0 contains the following OWL files:

I am curious as to how these files differ and haven't been able to find much information.

From an OpenTargets blog post:

we curated an extensive list of therapeutic areas that reflect the most appropriate body system, and therefore slimmed the ontology to ignore higher order terms (e.g. disease by anatomical system). The result is an EFO3-derived Open Targets Platform-specific profile-ontology which will be automatically generated with every monthly EFO release.

From opentargets/OnToma:

The ontology we use in the Open Targets platform is a subset (aka. slim) of the EFO ontology plus any HPO terms for which a valid EFO mapping could not be found.

Is there any other documentation I'm missing?

ravwojdyla commented 3 years ago

@dhimmel looking at the 3.11.0 release, the 1st release that contains those files there is some info:

For Open Targets, we have also generated an Open Targets profile (which contains all of EFO with the new Open Targets therapeutic areas) and slim file (which contains just Open Targets therapeutic areas). Both are also attached to this release.

dhimmel commented 3 years ago

EFO vs EFO-OTAR node comparison

I compared efo.owl and efo_otar_profile.owl from v3.25.0 and found that EFO-OTAR adds 10 nodes and removes 57 nodes from EFO.

Nodes added by EFO-OTAR

Here are the nodes EFO-OTAR adds (in purple outline) and their ancestors:

image

identifier name depth n_ancestors n_descendants ic_resnik ic_sanchez uri
MONDO:0018797 None 5 6 1 0.98 1.00 http://purl.obolibrary.org/obo/MONDO_0018797
OTAR:0000010 respiratory or thoracic disease 4 5 1147 0.50 0.31 http://www.ebi.ac.uk/efo/OTAR_0000010
OTAR:0000019 familial disease 5 6 1 0.98 1.00 http://www.ebi.ac.uk/efo/OTAR_0000019
OTAR:0000008 other 4 5 1 0.98 1.00 http://www.ebi.ac.uk/efo/OTAR_0000008
OTAR:0000018 genetic, familial or congenital disease 4 5 7927 0.29 0.12 http://www.ebi.ac.uk/efo/OTAR_0000018
OTAR:0000003 cyst 5 6 1 0.98 1.00 http://www.ebi.ac.uk/efo/OTAR_0000003
OTAR:0000014 pregnancy or perinatal disease 4 5 120 0.71 0.53 http://www.ebi.ac.uk/efo/OTAR_0000014
OTAR:0000009 injury, poisoning or other complication 4 5 117 0.71 0.53 http://www.ebi.ac.uk/efo/OTAR_0000009
OTAR:0000017 reproductive system or breast disease 4 5 859 0.53 0.34 http://www.ebi.ac.uk/efo/OTAR_0000017
OTAR:0000006 musculoskeletal or connective tissue disease 4 5 3002 0.39 0.21 http://www.ebi.ac.uk/efo/OTAR_0000006

One question I have is what is the purpose of adding "familial disease", "other", "cyst", since these are all leaf nodes? Are they actually a helpful way for OpenTargets to categorize disease? CC @d0choa. MONDO:0018797 also has no descendants, but appears to be a relic, soon to be removed, as per https://github.com/EBISPOT/efo/issues/938.

Nodes removed by EFO-OTAR

Here are the nodes EFO-OTAR removes (in purple outline) and their ancestors:

image

Expand for removed nodes table identifier | name | depth | n_ancestors | n_descendants | ic_resnik | ic_sanchez | uri -- | -- | -- | -- | -- | -- | -- | -- MONDO:0044999 | scalp disease | 7 | 8 | 8 | 0.95 | 0.80 | http://purl.obolibrary.org/obo/MONDO_0044999 MONDO:0021017 | synaptopathy | 6 | 7 | 13 | 0.93 | 0.75 | http://purl.obolibrary.org/obo/MONDO_0021017 MONDO:0019038 | rare maxillo-facial surgical disease | 8 | 16 | 222 | 0.75 | 0.47 | http://purl.obolibrary.org/obo/MONDO_0019038 MONDO:0043786 | serositis | 5 | 6 | 10 | 0.94 | 0.77 | http://purl.obolibrary.org/obo/MONDO_0043786 MONDO:0044974 | disease of supramolecular complex | 6 | 7 | 389 | 0.62 | 0.42 | http://purl.obolibrary.org/obo/MONDO_0044974 MONDO:0021635 | neurocristopathy | 5 | 8 | 134 | 0.75 | 0.52 | http://purl.obolibrary.org/obo/MONDO_0021635 MONDO:0044969 | disease of membrane bound organelle | 6 | 7 | 403 | 0.62 | 0.41 | http://purl.obolibrary.org/obo/MONDO_0044969 MONDO:0021668 | disorder involving pain | 4 | 5 | 13 | 0.90 | 0.75 | http://purl.obolibrary.org/obo/MONDO_0021668 EFO:1000755 | pigmentation disease | 6 | 11 | 117 | 0.78 | 0.53 | http://www.ebi.ac.uk/efo/EFO_1000755 MONDO:0044980 | disease of signal transduction | 6 | 7 | 125 | 0.73 | 0.53 | http://purl.obolibrary.org/obo/MONDO_0044980 MONDO:0044979 | disease by cell type | 6 | 7 | 506 | 0.60 | 0.39 | http://purl.obolibrary.org/obo/MONDO_0044979 MONDO:0021197 | disease by cellular component affected | 5 | 6 | 1339 | 0.49 | 0.29 | http://purl.obolibrary.org/obo/MONDO_0021197 MONDO:0024623 | otorhinolaryngologic disease | 6 | 7 | 337 | 0.64 | 0.43 | http://purl.obolibrary.org/obo/MONDO_0024623 MONDO:0044975 | disease of transporter activity | 6 | 7 | 74 | 0.77 | 0.58 | http://purl.obolibrary.org/obo/MONDO_0044975 MONDO:0024627 | phagocytic cell dysfunction | 7 | 8 | 47 | 0.83 | 0.62 | http://purl.obolibrary.org/obo/MONDO_0024627 MONDO:0002436 | nasal disorder | 7 | 10 | 40 | 0.89 | 0.64 | http://purl.obolibrary.org/obo/MONDO_0002436 MONDO:0021073 | paraneoplastic syndrome | 5 | 6 | 9 | 0.93 | 0.78 | http://purl.obolibrary.org/obo/MONDO_0021073 MONDO:0018652 | biological anomaly without phenotypic characterization | 5 | 6 | 4 | 0.96 | 0.86 | http://purl.obolibrary.org/obo/MONDO_0018652 MONDO:0044989 | foot disease | 6 | 7 | 10 | 0.93 | 0.77 | http://purl.obolibrary.org/obo/MONDO_0044989 MONDO:0044987 | face disease | 7 | 8 | 1719 | 0.50 | 0.27 | http://purl.obolibrary.org/obo/MONDO_0044987 MONDO:0020683 | acute disease | 4 | 5 | 89 | 0.75 | 0.56 | http://purl.obolibrary.org/obo/MONDO_0020683 MONDO:0021195 | disease by cellular process disrupted | 5 | 6 | 2008 | 0.45 | 0.25 | http://purl.obolibrary.org/obo/MONDO_0021195 EFO:0000524 | head and neck disorder | 5 | 6 | 2103 | 0.45 | 0.25 | http://www.ebi.ac.uk/efo/EFO_0000524 EFO:0009470 | soft tissue disease | 4 | 5 | 124 | 0.72 | 0.53 | http://www.ebi.ac.uk/efo/EFO_0009470 MONDO:0024317 | chronic pain syndrome | 5 | 6 | 6 | 0.95 | 0.82 | http://purl.obolibrary.org/obo/MONDO_0024317 EFO:0000405 | digestive system disease | 5 | 6 | 1236 | 0.51 | 0.30 | http://www.ebi.ac.uk/efo/EFO_0000405 MONDO:0021670 | post-infectious syndrome | 5 | 7 | 2 | 0.99 | 0.93 | http://purl.obolibrary.org/obo/MONDO_0021670 MONDO:0017368 | systemic disease with skin involvement | 6 | 7 | 42 | 0.83 | 0.63 | http://purl.obolibrary.org/obo/MONDO_0017368 MONDO:0021196 | disease by molecular activity disrupted | 5 | 6 | 251 | 0.65 | 0.46 | http://purl.obolibrary.org/obo/MONDO_0021196 Orphanet:79389 | Premature aging | 5 | 6 | 83 | 0.75 | 0.57 | http://www.orpha.net/ORDO/Orphanet_79389 MONDO:0021147 | disorder of development or morphogenesis | 4 | 5 | 3827 | 0.36 | 0.19 | http://purl.obolibrary.org/obo/MONDO_0021147 MONDO:0044977 | disease of receptor activity | 6 | 7 | 7 | 0.95 | 0.81 | http://purl.obolibrary.org/obo/MONDO_0044977 MONDO:0017261 | systemic diseases with panuveitis | 6 | 7 | 6 | 0.96 | 0.82 | http://purl.obolibrary.org/obo/MONDO_0017261 EFO:0009714 | chronic disease | 4 | 5 | 107 | 0.73 | 0.54 | http://www.ebi.ac.uk/efo/EFO_0009714 MONDO:0021674 | post-viral disorder | 5 | 6 | 56 | 0.79 | 0.61 | http://purl.obolibrary.org/obo/MONDO_0021674 MONDO:0002254 | syndromic disease | 4 | 5 | 2541 | 0.39 | 0.23 | http://purl.obolibrary.org/obo/MONDO_0002254 MONDO:0021673 | post-bacterial disorder | 5 | 6 | 1 | 0.98 | 1.00 | http://purl.obolibrary.org/obo/MONDO_0021673 EFO:0009903 | inflammatory disease | 4 | 5 | 597 | 0.57 | 0.37 | http://www.ebi.ac.uk/efo/EFO_0009903 MONDO:0021199 | disease by anatomical system | 4 | 5 | 10922 | 0.27 | 0.09 | http://purl.obolibrary.org/obo/MONDO_0021199 MONDO:0005042 | head disease | 6 | 7 | 2012 | 0.47 | 0.25 | http://purl.obolibrary.org/obo/MONDO_0005042 MONDO:0024626 | defective phagocytic cell engulfment | 6 | 10 | 8 | 0.96 | 0.80 | http://purl.obolibrary.org/obo/MONDO_0024626 MONDO:0044971 | disease of macromolecular complex | 6 | 7 | 155 | 0.72 | 0.51 | http://purl.obolibrary.org/obo/MONDO_0044971 MONDO:0020595 | disease of retroperitoneum | 6 | 7 | 18 | 0.94 | 0.72 | http://purl.obolibrary.org/obo/MONDO_0020595 EFO:0009479 | throat disease | 6 | 7 | 1 | 0.99 | 1.00 | http://www.ebi.ac.uk/efo/EFO_0009479 MONDO:0017259 | systemic diseases with anterior uveitis | 6 | 7 | 13 | 0.92 | 0.75 | http://purl.obolibrary.org/obo/MONDO_0017259 MONDO:0021016 | channelopathy | 7 | 8 | 57 | 0.81 | 0.60 | http://purl.obolibrary.org/obo/MONDO_0021016 MONDO:0044965 | abdominal and pelvic region disorder | 5 | 6 | 977 | 0.53 | 0.32 | http://purl.obolibrary.org/obo/MONDO_0044965 MONDO:0020012 | systemic or rheumatic disease | 4 | 5 | 312 | 0.62 | 0.44 | http://purl.obolibrary.org/obo/MONDO_0020012 MONDO:0024505 | disorder by anatomical region | 4 | 5 | 4746 | 0.35 | 0.17 | http://purl.obolibrary.org/obo/MONDO_0024505 MONDO:0015938 | systemic disease | 5 | 6 | 257 | 0.65 | 0.46 | http://purl.obolibrary.org/obo/MONDO_0015938 MONDO:0044976 | disease of catalytic activity | 6 | 7 | 173 | 0.70 | 0.49 | http://purl.obolibrary.org/obo/MONDO_0044976 MONDO:0017260 | systemic diseases with posterior uveitis | 7 | 8 | 4 | 0.98 | 0.86 | http://purl.obolibrary.org/obo/MONDO_0017260 EFO:0009664 | disease of orbital region | 6 | 9 | 1481 | 0.52 | 0.28 | http://www.ebi.ac.uk/efo/EFO_0009664 MONDO:0044967 | limb disorder | 5 | 6 | 69 | 0.78 | 0.58 | http://purl.obolibrary.org/obo/MONDO_0044967 MONDO:0044990 | hand disease | 6 | 7 | 6 | 0.95 | 0.82 | http://purl.obolibrary.org/obo/MONDO_0044990 EFO:0001058 | sensory system disease | 6 | 7 | 291 | 0.64 | 0.44 | http://www.ebi.ac.uk/efo/EFO_0001058 MONDO:0021194 | disease by subcellular system affected | 4 | 5 | 2901 | 0.40 | 0.22 | http://purl.obolibrary.org/obo/MONDO_0021194

Code

Code to produce these figures and tables is not yet available, but is based on nxontology. I hope to make the nxontology importer for EFO available soon.

d0choa commented 3 years ago

Thanks @dhimmel for the analysis. It's really useful. @zoependlington can provide more details.

From the Open Targets perspective, the background story behind the slim was that we wanted to align EFO to a more clinical interpretation. EFO has a lot of high-level organisational nodes that attend to anatomical characteristics (many of them can be seen on your analysis). However, they have little or no clinical value (e.g. disease by anatomical system). Instead, the top-nodes of the slim resemble other clinical classifications like Meddra.

In the process of reorganising the terms, a few terms have to be removed, relocated or split. You can find the logic behind most of the changes in the respective tickets. For the ones that you raised I found the next:

@paolaroncaglia and @zoependlington can comment on these two.

Regarding Other, it's a placeholder for newly introduced terms in EFO that have no parentage relationship in the slim. We aimed to have it empty, as all diseases should be children of other root level terms (therapeutic areas). You can consider it an artefact of the process and we should eventually remove it.

dhimmel commented 3 years ago

Quoting @zoependlington from https://github.com/EBISPOT/efo/issues/927#issuecomment-760762229 regarding forced relationships in EFO-OTAR:

The forced relationships are defined in the subclasses templates file found in the temporary/working home of OTAR_profiler here: https://github.com/EBISPOT/otar_profiler

Just a note that the "final" version for use by Open Targets is the slim file, which only contains the therapeutic areas that are useful for annotating their data. The profile is our master EFO with a few extra terms, which will eventually be added to the master EFO file once we have completed the ongoing work with our profile and slim files to be compatible with the Open Targets pipelines and their needs.

Great to know about EBISPOT/otar_profiler. I see that otar_ta.sh is the script that creates efo_otar_profile.owl and efo_otar_slim.owl. allTAs.txt contains a list of therapeutic areas and newterms.tsv contains nodes added by EFO-OTAR.

Based on otar_ta.sh, it looks like efo_otar_slim.owl is derived from efo_otar_profile.owl by filtering to therapeutic areas and their descendants (via robot MIREOT --branch-from-terms. So this is is useful for OpenTargets which wants a hierarchy of diseases only without other parts of the ontology?

Regarding "the profile is our master EFO with a few extra terms, which will eventually be added to the master EFO file", does that mean the eventual plan is to take all the modifications in efo_otar_profile.owl and move them upstream to efo.owl? If so, does that mean efo_otar_profile.owl might eventually go away, because it would be the same as efo.owl? And does this also mean EFO intends to remove the "organisational nodes that attend to anatomical characteristics" in favor of EFO-OTAR's "clinical interpretation"?

Getting back to the original documentation request, it would be nice to have guidance in the README regarding when to use efo-base.owl, efo.owl, efo_otar_profile.owl, versus efo_otar_slim.owl. My current understanding is:

  1. efo-base.owl: use if you only want terms from the EFO namespace (subClassOf relationships might be incomplete?)
  2. efo.owl: use if you want the primary EFO release with terms from the EFO namespace and those imported from other ontologies
  3. efo_otar_profile.owl: use if you want the complete ontology, with modifications introduced by OpenTargets, which might eventually be adopted in efo.owl.
  4. efo_otar_slim.owl: use if you want an ontology of diseases rooted to therapeutic areas, as defined and used by OpenTargets

Is this understanding correct?