franciscozorrilla / metaGEM

:gem: An easy-to-use workflow for generating context specific genome-scale metabolic models and predicting metabolic interactions within microbial communities directly from metagenomic data
https://franciscozorrilla.github.io/metaGEM/
MIT License
189 stars 41 forks source link

How to caculate the smetana result's significant difference #114

Closed jllhghf closed 1 year ago

jllhghf commented 1 year ago

Hi Francisco :),

Thank you for your outstanding work. I'm attempting to use this pipeline to analyze the gut microbiome. But I have some questions about the smetana result.

  1. In the manuscript and answer to #13 and #84, you only used the M11 to calculate the significance of the different groups. Whether I needed to use different mediums to calculate the significance of different groups when I didn't know which type of metabiolism would be used in interaction.

  2. In the #84, the smetana result contains repeated metabiolism (with equal or unequal value) in the same group. How did you delete the duplicated metabiolism when you used the Wilcoxon rank test?

Best, Lucia

franciscozorrilla commented 1 year ago

Hi Lucia,

Happy to hear that you find metaGEM useful. In case you have not already, please also read the CarveMe & SMETANA papers which include details regarding implementation and methods used.

  1. In the manuscript and answer to https://github.com/franciscozorrilla/metaGEM/issues/13 and https://github.com/franciscozorrilla/metaGEM/issues/84, you only used the M11 to calculate the significance of the different groups. Whether I needed to use different mediums to calculate the significance of different groups when I didn't know which type of metabiolism would be used in interaction.

I am not sure what the question is here, could you try to re-phrase? I can tell you that in the paper we gapfilled the models using dGMM + LAB media (M3) and then predicted interactions under dGMM + LAB excluding aromatic amino acids (M11) from this publication, see Fig 1 Panel d. To find statistically significant differences across disease states we grouped the SMETANA scores by receiver, donor, and compound. The choice of media used for gapfilling and simulation will have an effect on the interactions predicted between the models, so this is something that you should think about in the context of your research question.

  1. In the https://github.com/franciscozorrilla/metaGEM/issues/84, the smetana result contains repeated metabiolism (with equal or unequal value) in the same group. How did you delete the duplicated metabiolism when you used the Wilcoxon rank test?

Could you please share an example of your SMETANA output showing duplicated metabolism/metabolites? This should not be the case. In the issue that you cite, the smet_all.tsv does not contain any duplicated metabolism. It is likely that a metabolite is exchanged by multiple community members, so it may appear multiple times for a community, but it should have a different receiver or donor or media condition, and that is why the SMETANA score can be different if you only look at the metabolite column. Remember that a SMETANA score is always associated with a donor providing a metabolite to a receiver under a given media composition. In fact, if you didn't have multiple SMETANA values for each condition (i.e. NGT, IGT, T2D) and donor/receiver/metabolite, then you would not be able to perform a statistical test 🤓

Please have a look at this comment, as it clearly describes how to go from to the SMETANA output to the statistical tests and associated figures.

Hope this helps and best wishes, Francisco

jllhghf commented 1 year ago

Hi Francisco,

Thank you very much for the detailed response.

I think I was mistaken there. There is no duplicate value because, as you stated, receiver and donor are two related variables when we calculate the significance of metabolites.

Your comment is helpful for me to understand. Thanks 😃.

Best, Lucia