dhimmel / integrate

Scripts and resources to create Hetionet v1.0, a heterogeneous network for drug repurposing
https://doi.org/10.15363/thinklab.4
31 stars 16 forks source link

Questions about hetionet: metabolomics / side effects versus diseases #15

Open gcsh86 opened 5 years ago

gcsh86 commented 5 years ago

Hi Daneil,

For Hetionet, I have two brief questions and would like to hear about your insights:

  1. on metabolomics side, why didn't you use the HMDB database for linking metabolites, diseases, variants, genes etc?

  2. For sepsis and chronic fatigue, why they are categorized as side effects rather than diseases?

dhimmel commented 5 years ago

Thanks @gcsh86 for the questions.

on metabolomics side, why didn't you use the HMDB database for linking metabolites, diseases, variants, genes etc?

There is no reason other than I wasn't aware of an omics-wide resource for metabolite nodes / edges. I also wasn't sure whether metabolites would be redundant with compounds. Metabolomics is an area that I don't know much about, so there is potentially opportunity I overlooked.

When considering adding an additional data resource, I recommend first drawing out what node/edge types that resource would contribute to the metagraph (Figure 1A of the manuscript). @gcsh86 if you have a specific proposal of what node/edge types could be generated from HMDB, I could provide more tailored feedback.

For sepsis and chronic fatigue, why they are categorized as side effects rather than diseases?

Some entities can conceptually belong to multiple node types. This is especially true for diseases, side effects, and symptoms. For these three node types, you could imagine a single concept being all three types. For example, sepsis or chronic fatigue could potentially be all three. For Hetionet, we created separate nodes for diseases, side effects, and symptoms. Therefore, it is possible for fatigue to be three separate nodes depending on its context. Do the following query at https://neo4j.het.io/browser/ and you will see "fatigue" shows up in the names of both side effects and symptoms:

MATCH (node)
WHERE node.name =~ '(?i).*fatigue.*'
RETURN node

image

You can learn more about how we created our disease catalog in this discussion. Briefly quoting from the manuscript:

We selected 137 terms from the Disease Ontology (which we refer to as DO Slim) as our disease set. Our goal was to identify complex diseases that are distinct and specific enough to be clinically relevant yet general enough to be well annotated. To this end, we included diseases that have been studied by GWAS and cancer types from TopNodes_DOcancerslim. We ensured that no DO Slim disease was a subtype of another DO Slim disease.

So fatigue and sepsis did not make this cut.

More generally, one could allow a single node to have multiple types (or labels in neo4j parlance). For simplicity, we did not allow this when building Hetionet. The number of nodes that this effects is relatively low and the one-type-per-node assumption helped simplify the metagraph and computations.