Request to understand the LMM models in alpha, beta diversity and Differencial abundance.

OrsonMM commented 2 days ago

Dear team MicrobiomeStat,

I am appreciate very much your software contribution. I am new using Lineal Mixed models. Please can you suggest me If I am used my data correctly.

In my experiment, I have this variables:

asv variable : Taxonomical abundances of DADA2 output treat variable: 4 differents (A, B, C and D)
time variable : 3 differents time points (1,2,3) sample_treatment_time variable: 5 independent samples for each treatment and their respective replicates over time (60 samples in total).

My question is what is the asv community that are affected by Treat, Time or interaction of these Treat:Time.

I am enter my variables for model in MicrobiomeStat:

group.var = Treat subject.var = sample_treatment_time time.var = Time

Please can you explain me how is the ecuation form :

In the manual I am not sure if use the same model for alpha and beta diversity and for diferential abundance of AVS.

I understand that use : y ~ time.var + group.var + time.var : group.var + (1 | subject.var) is correct ??

Greats

cafferychen777 commented 2 days ago

Dear Orson,

Thank you for your interest in MicrobiomeStat and for reaching out with your question about Linear Mixed Models (LMM). We appreciate your detailed description of your experimental design.

From your description, I can see you have:

4 treatments (A, B, C, D)
3 time points
5 independent samples per treatment with replicates over time
A total of 60 samples

While the model formula you suggested (y ~ time.var + group.var + time.var:group.var + (1|subject.var)) is generally appropriate for longitudinal microbiome data analysis, to better assist you, could you please specify which MicrobiomeStat function(s) you are using?

Each function might have slightly different implementations to accommodate the specific needs of alpha diversity, beta diversity, and differential abundance analyses.

Once you clarify which function(s) you're working with, I can provide more specific guidance about the model implementation.

Best regards

OrsonMM commented 2 days ago

Hi Caffery Yang,

Thank's for rapid response,

I understand based on your response that each function generate a different ecuation model. I have more doubts in these functions:

alpha diversity

alpha_time_diversity <- generate_alpha_trend_test_long(
  data.obj = rarefy_data_genus,
  alpha.name = c("shannon", "simpson", "observed_species", "chao1", "ace","pielou"),
  depth = NULL,
  time.var = "Time",
  subject.var = "sample_treatment_time",
  group.var = "Treat",
  adj.vars = NULL
  )

Beta diversity

beta_diversity <- generate_beta_trend_test_long(
data.obj = rarefy_data_genus,
dist.obj = NULL,
subject.var = "sample_treatment_time",   # random effect - I am not understand if is a slope or intercept ramdom  
time.var = "Time", # Fixed effect 
group.var = "Treat",
adj.vars = NULL,
dist.name = c("Jaccard")
)


beta_diversity_volatility <- generate_beta_volatility_test_long(
data.obj = rarefy_data_genus,
dist.obj = NULL,
subject.var = "sample_treatment_time",
time.var = "Time",
group.var = "Treat",
adj.vars = NULL,
dist.name = c("BC","Jaccard","UniFrac","JS")
)

3. DA

Here, I prefered used linda because I can put the ecuation. 
(But I am not sure if its correct)

model_1 <- linda( feature.dat = genus_normalizated_data$feature.tab, meta.dat = genus_data$meta.dat, formula = '~ Time + Treat + Treat:Time + (1 | sample_treatment_time)', feature.dat.type = c('proportion'), prev.filter = 0.1, mean.abund.filter = 0, max.abund.filter = 0, is.winsor = TRUE, outlier.pct = 0.03, adaptive = TRUE, zero.handling = c('imputation'), pseudo.cnt = 0.5, corr.cut = 0.1, p.adj.method = "fdr", alpha = 0.05, n.cores = 20, verbose = TRUE )

cafferychen777 commented 2 days ago

Hi Orson,

Thank you for your detailed follow-up questions about the model equations in MicrobiomeStat. I'll explain how each function implements its statistical models:

Alpha Diversity Analysis For your alpha_time_diversity call, the function implements a linear mixed effects model of the form:
```
alpha_diversity ~ Treat * Time + (1 + Time | Sample_Time)
```

This model includes:

Fixed effects: Treatment, Time, and their interaction (Treat * Time)
Random effects: Both random intercepts AND random slopes for Time nested within each Sample
This allows each sample to have its own trajectory over time

Beta Diversity Analysis For your beta_diversity call, the function attempts two model structures in order of complexity:

First tries:

Jaccard_distance ~ Treat * Time + (1 + Time | Sample_Time)

If that fails to converge, automatically simplifies to:

Jaccard_distance ~ Treat * Time + (1 | Sample_Time)

For your beta_diversity_volatility call, this is actually a different type of analysis. It:

First calculates volatility (rate of change between consecutive timepoints) for each subject
Then fits a simple linear model: volatility ~ Treat
Differential Abundance Analysis (linda) Your formula is well-structured:
```
abundance ~ Time + Treat + Treat:Time + (1 | Sample_Time)
```

This model:

Tests main effects of Time and Treatment
Tests their interaction
Includes random intercepts for each Sample
The function also applies CLR transformation to abundances and handles zeros/outliers appropriately

Some suggestions for your analysis:

For the alpha and beta trend analyses, the default inclusion of random slopes is appropriate for longitudinal data but may not converge with only 3 timepoints. Don't worry if this happens - the functions will automatically simplify to random intercepts.
Make sure your "Sample_Time" variable uniquely identifies samples that are measured repeatedly. Each independent sample should have a consistent identifier across its timepoints.
For linda, you could consider matching the alpha/beta diversity models by using:
```
~ Time + Treat + Treat:Time + (1 + Time | Sample_Time)
```
Though your current random intercept model is also perfectly valid.

Overall, your implementation looks appropriate for your experimental design (4 treatments, 3 timepoints, 5 replicates per treatment-timepoint combination). Let me know if you need any clarification about specific aspects of these models.

Best regards, Chen

cafferychen777 commented 2 days ago

PS: I'd like to encourage you to explore MicrobiomeStat's rich visualization capabilities to complement your statistical analyses.

OrsonMM commented 2 days ago

I appreciated so much your help @cafferychen777

cafferychen777 / MicrobiomeStat

Request to understand the LMM models in alpha, beta diversity and Differencial abundance. #67