is there a gllvm way to do the equivalent of mvabund::anova.manyglm()?

JenniNiku / gllvm

Generalized Linear Latent Variable Models

https://jenniniku.github.io/gllvm/

48 stars 19 forks source link

is there a gllvm way to do the equivalent of mvabund::anova.manyglm()? #107

Closed dougwyu closed 1 year ago

dougwyu commented 1 year ago

Hi again,

I'm curious if there is a gllvm way to test for a covariate effect on difference in composition, equivalent to mvabund::anova.manyglm() and producing a p-value? I've been trying to use mvabund for this, but there are some limitations in study design.

thanks, doug

BertvanderVeen commented 1 year ago

Not equivalent, no. There is gllvm::anova, but it is not very helpful to compare models where the number of parameters is drastically different. Mvabund relies on re-sampling, and that is just something that is not really doable with gllvm.

Primarily for computational reasons, but also due to the sensitivity to starting values of the algorithm. You might want to have a look at summary(model) as an alternative, which produces wald-statistics per species and predictor with accompanying p-values.

dougwyu commented 1 year ago

thanks for that. Do you mean something like the following? (based on your example)

data(antTraits)
y <- as.matrix(antTraits$abund)
X <- scale(as.matrix(antTraits$env))
TR <- antTraits$traits

fit_1 <- gllvm(y, X, TR, family = "negative.binomial", 
               formula = y ~ Bare.ground)
#with trait argument defined

fit_2 <- gllvm(y, X, family = "negative.binomial", 
               formula = y ~ Bare.ground)
#without trait argument defined

summary(fit_1)
summary(fit_2)

and by not including an interaction term in the formula, I get a main effect p-value. How should I think about this?

> summary(fit_1)

Call:
gllvm(y = y, X = X, TR = TR, formula = y ~ Bare.ground, family = "negative.binomial")

Family:  negative.binomial 

AIC:  4048.171 AICc:  4098.988 BIC:  4886.993 LL:  -1860.1 df:  164 

Informed LVs:  0 
Constrained LVs:  0 
Unconstrained LVs:  2 

Formula:  ~Bare.ground 
LV formula:  ~ 0 

Coefficients predictors:
            Estimate Std. Error z value Pr(>|z|)    
Bare.ground   0.1840     0.0559   3.291    0.001 ***

BertvanderVeen commented 1 year ago

General information on the wald test can be found here: https://en.wikipedia.org/wiki/Wald_test. But in short; you can use the p-value to assess if there is evidence for a certain effect (in example that is the amount of bare soil) on the response variable.

tanharri commented 1 year ago

Hi @BertvanderVeen. In the above example, why does defining the trait matrix TR result in a different model when the formula specified is the same?

BertvanderVeen commented 1 year ago

Trait models in gllvm work quite differently from models without traits. Even when the traits are not specified in the model formula, by specifying the trait matrix in the model the "trait-route" is taken. The trait model with species-specific predictor effects is IIRC unidentifiable, so when taking the "trait-route" the effect needs to be over the whole community, while without traits it it can be species-specific.

tanharri commented 1 year ago

I can populate the field with a dummy matrix to get this, but is there a syntax to retrieve the whole community effect without defining a TR trait matrix?

BertvanderVeen commented 1 year ago

Not that I know of, but perhaps @JenniNiku.

BertvanderVeen commented 1 year ago

@hjftan-nm I had a quick think about this. Here is an example to do what you ask:

> data(spider)
> model <- gllvm(spider$abund, spider$x, spider$trait ,formula = y ~ as.factor(species):bare.sand,  family='poisson')