FrederickHuangLin / ANCOMBC

Differential abundance (DA) and correlation analyses for microbial absolute abundance data
https://www.nature.com/articles/s41467-020-17041-7
108 stars 29 forks source link

Help with ancombc2 analysis with repeated measures #259

Open tillenglert opened 7 months ago

tillenglert commented 7 months ago

Hi all,

First of all, thank you for the great tool and very extensive tutorial on ANCOM-BC2. Still, I’m not quite sure how to interpret and define my model for my analysis!

For better understanding this is the data I have:

We sequenced (16S) 450 stool samples of 150 participants of an intervention study over 8 Weeks. Each participant performed one of two training methods over these 8 weeks after being physically inactive for a year. They provided us with extensive metadata (demographic data like age and sex, diet data, medications, etc.) and 3 stool samples (0 weeks: before the training starts, 4 weeks of training and 8 weeks of training). With each stool sample an additional questionnaire was filled with info on what they ate over the last 4 weeks. They also performed in a fitness test giving raise to multiple different weights they can lift for different exercises.

So, in summary, we have 150 participants split within 2 training methods and each provided us with 3 timepoints and respective metadata.

We have the following questions:

  1. We want to analyse if we can find differences within the microbiome, which are induced by the training after 4 and 8 weeks. How would we define the random and fixed formula for this problem (first try see below)?
  2. How would we need to change the formula to capture the effect of the training methods (categories) in regard to the microbiome?
  3. We want to include more metadata like diet (continuous data and/or categorial), medication (categorial) etc: Where are they defined? Can we append them to the random formula by x:x?

I already tested some formulas and analysis strategies to find differences induced by the training including sex and training method, but stuck with the following call:

ancombc2(data = tse,
         assay_name = "counts",
         tax_level = “sequence”,
         fix_formula = "Timepoint_num + Sex + Training”,
         rand_formula = "(Timepoint_num | Patient_Id_Cat:Sex:Training)",
         p_adj_method = "holm",
         pseudo_sens = TRUE,
         prv_cut = 0.2,
         lib_cut = 100,
         neg_lb = TRUE,
         verbose = TRUE,
         lme_control = lme4::lmerControl(),
         mdfdr_control = list(fwer_ctrl_method = "holm", B = 100),
         )

I oriented my call on the tutorial section for longitudinal data. And got results, which are differentially abundant in Timepoints, but unsure if that is in fact the right formula to search for the differences induced by the training, as first of all the log change fold is in the range of -0.3 to 0.3 and I added the Timepoints to the random formula and am not sure if that’s the correct way of defining the random effects.

I would be very grateful for your help and if you need any other info to help my please reach out!

All the best,

Till