biolink / biolink-model

Schema and generated objects for biolink data model and upper ontology
https://biolink.github.io/biolink-model/
Other
170 stars 71 forks source link

express correlation between blood analytes and gut microbiome #888

Closed sierra-moxon closed 2 years ago

sierra-moxon commented 2 years ago

We have a use case from Arpita J. (Multiomics Provider) to express correlations between blood analytes (or some other entity) and the gut microbiome. They have KEGG ortholog identifiers for the data representing the gut microbiome.

Perhaps we are trying to say "this KEGG function is correlated with the gut." This data is from the ISB Wellness data set.

It would be helpful to understand the kind of searches intended to bring back this data, so we can make sure we're modeling the critical bits that would be found via those searches. For example, is the "important" part of this statement that the KEGG function is correlated with the gut specifically, or that the KEGG function is correlated with a specific phenotype like "increased hemoglobin" (made up example phenotype), or even that the KEGG function is associated with the gut microbiome. (Or probably all three! :)).

tagging @aj95b (generator of this question) and @karthiksoman (SPOKE representative) for their feedback.

realmarcin commented 2 years ago

Hi Sierra — I'm involved in metagenome knowledge modeling with other hats so jumping in with 2 cents, early days still for the field! It looks like the statement is something like: “presence of KO function X in the gut microbiome (samples 1,2,3) correlates with change in blood metabolite Y” ?

For metagenomics these statements are almost always still correlative — but often that is not stated explicitly which generates a lot of confusion. A rare example of a more causal finding is the A20 locus and allergies https://science.sciencemag.org/content/349/6252/1106

aj95b commented 2 years ago

The entire ISB's wellness dataset is composed of samples from a cohort of largely healthy individuals, so as of now we are not looking at over-representation of a phenotype. We only want to present its (the gut microbiome's) statistical correlation with blood analytes in this cohort.

karthiksoman commented 2 years ago

Given the description of the data, what I understand is that wellness dataset gives statistical correlations between KO functions and blood analytes from a cohort of healthy individuals. In SPOKE, we have modelled microbiome pathways as nodes from kegg modules instead of KO. However, these pathway nodes do not have direct connection with Compounds (in this case analytes). Instead the path goes as: pathway--> EC--> Reaction --> Compounds. So, I think, one can check to trace this path in the knowledge network, observing the statistical correlation from the wellness dataset. It would be nice to bring Sergio on this discussion to get more ideas on the kind of searches possible on wellness dataset.

nlharris commented 2 years ago

Will discuss during Biolink Help Desk on Oct 11.

sierra-moxon commented 2 years ago

Notes from our biolink-model helpdesk call:

Screen Shot 2021-10-11 at 11 10 11 AM

Notes from predicates WG meeting:

Screen Shot 2021-10-11 at 11 04 13 AM

Action items:

Also: automate additions of feature variables and statistical attributes via EPC modeling worksheet (with definitions) to biolink-model (three here, will come in a different ticket).