dhimmel / integrate

Scripts and resources to create Hetionet v1.0, a heterogeneous network for drug repurposing
https://doi.org/10.15363/thinklab.4
31 stars 16 forks source link

Bias in Anatomy–downregulates–Gene and Anatomy–upregulates–Gene edges #14

Closed annabreit closed 5 years ago

annabreit commented 5 years ago

Hi Daniel,

If I understand it correctly, you used the information about over-/under-expression from Bgee to create your Anatomy–downregulates–Gene and Anatomy–upregulates–Gene edges [1]. These values provided by Bgee refer to over-/under-expression across anatomy but also over-/under-expression across life stages [2]. However, you are using only data from adults and therefore do not provide the life stage in your anatomy nodes. Could this in your opinion produce a bias in your network or affect it or predictions etc in any other way?

annabreit commented 5 years ago

Nevermind, just found that it was possible to distinguish between these two cases in version 13 (which you used), wasn't that clear from the documentation provided by bgee

dhimmel commented 5 years ago

Thanks @annabreit for your comments.

Looking back at this excerpt, you are correct that the Antomy-expresses-Gene edges in the network were observed from adult samples only:

Using the simple dataset, I found all gene–anatomy pairs where Expression is present and Call quality is high quality for any adult developmental stage. To identify adult developmental stages, I filtered for HsapDv:0000087 and its descendants.

The Anatomy-up/down-regulates-Gene genes, are not just from adult samples, if I understand correctly from this excerpt:

Bgee provides calls of over-/under-expression. A call corresponds to a gene, with significant variation of its level of expression, in an anatomical entity during a developmental stage, as compared to, either: i) other anatomical entities at the same (broadly defined) developmental stage (over-/under-expression across anatomy);

Whether the developmental stage of expression samples affects Hetionet and its applications, I am not sure. While developmental stage likely is important for expression, the high level of noise in expression-derived data may overwhelm the bias of focusing only on adult stages. The effect probably depends on the application. For Project Rephetio, most of the diseases were adult diseases. Furthermore, paths containing Anatomy-expresses-Gene edges played a limited role in our predictions.

Nevermind, just found that it was possible to distinguish between these two cases in version 13 (which you used), wasn't that clear from the documentation provided by bgee

Hopefully, you can filter for the stages you'd like for your application.