TheoreticalEcology / s-jSDM

Scalable joint species distribution modeling
https://cran.r-project.org/web/packages/sjSDM/index.html
GNU General Public License v3.0
68 stars 14 forks source link

Question: importance() x anova() #101

Open wevertonbio opened 2 years ago

wevertonbio commented 2 years ago

Hi,

I'm trying to extract the three components of metacommunity internal structure (Environment, spatial and biotic), the same generated by plot(anova(model), internal= TRUE).

I know I can get it using:

an <- anova(model)
plot.anova <- plot(an, internal = TRUE)
comp.species <- plot.anova$data$Species  #individual contributions to the species
comp.sites <- plot.anova$data$Species  # individual contributions to the sites

I also tried it using the function importance(model), however it gives a different result. What is the difference between them?

I also noticed that the R2 from plot.anova$data$Species$r2 are different from R2 from an$species$R2_Nagelkerke$Full (whereby an = anova(model)). Where do the R2 in plot(anova (model), internal = TRUE) come from?

To illustrate my question, here's a reproducible example:

library(sjSDM)

#Simulate data
set.seed(42)
community <- simulate_SDM(sites = 100, species = 10, env = 3)
Env <- community$env_weights
Occ <- community$response
SP <- matrix(rnorm(200, 0, 0.3), 100, 2) # spatial coordinates (no effect on species occurences)

model <- sjSDM(Y = Occ, env = linear(data = Env, formula = ~X1+X2+X3), spatial = linear(data = SP, formula = ~0+X1:X2), se = TRUE, family=binomial("probit"), sampling = 100L)
summary(model)

#Anova
an <- anova(model)
plot.anova <- plot(an, internal = TRUE)
comp.sites <- plot.anova$data$Sites # individual contributions to the sites
comp.species <- plot.anova$data$Species #individual contributions to the species
head(comp.species) #It is different from head(df.imp)
         env          spa     codist         r2
1 0.11019893 1.074359e-02 0.08830272 0.02129541
2 0.03653703 3.177195e-03 0.08606563 0.01213296
3 0.16736133 7.431965e-05 0.08027697 0.02339748
4 0.01981073 8.094543e-03 0.08831885 0.01023774
5 0.09569650 5.129817e-03 0.09061546 0.01981498
6 0.34139446 3.530965e-03 0.09031628 0.04211111

#Importance
imp.m <- importance(model)
df.imp <- data.frame(Environment = imp.m$res$total$env, #Extract data to dataframe
                     Spatial = imp.m$res$total$spatial,
                     Biotic = imp.m$res$total$biotic,
                     r2 = an$species$R2_Nagelkerke$Full)
head(df.imp) #It is different from head(comp.species)
Environment      Spatial     Biotic        r2
1   0.7858316 2.341709e-03 0.21182665 0.2463566
2   0.9796516 7.588521e-03 0.01275981 0.1533345
3   0.8616773 2.883220e-07 0.13832229 0.2769410
4   0.9315547 3.222125e-02 0.03622399 0.1279683
5   0.9006717 2.776958e-03 0.09655125 0.2372294
6   0.9318694 7.699431e-04 0.06736068 0.4413699
MaximilianPi commented 2 years ago

Hi @WevertonBio,

Difference between importance and anova:

The overall structure should be similar between the importance and the anova function, however, our current anova is a type I anova where the unique components (A, B, and S) are affected by collinearity (

image

) i.e. the individual components are not corrected for the shared contributions (the spatial only model will estimate non-zero effects for space because the second spatial predictor is weakly correlated with the third environmental predictor) which is not an issue for the importance(...) where we use the full model (so the confounding effect of the shared contributions is accounted for, i.e. the spatial estimates are correctly estimated to zero). We are currently working on type II anova to improve this behaviour.

About the R Squareds of the return objects, I forgot to normalise the R2s of the plot.anova values. They are actually the same (if you use McFadden instead of Nagelkerke):

image
florianhartig commented 2 years ago

Hi Max, looking at this, I think we should have a go at the vignette in a quiet moment and expand the explanation. I can also give this a go if a quiet day comes up in summer! Your explanation here would already be a big improvement over the text that is currently in the vignette.

F

wevertonbio commented 2 years ago

Hi @MaximilianPi,

Thank you very much for your quick and helpful answer!

Another doubt: in issue https://github.com/TheoreticalEcology/s-jSDM/issues/73, you said that "importance(model) calculates the variation partitioning of the three groups (env, spatial, and biotic) for each site".

Where exactly is this stored? I was only able to find the importance of components to individual species (df.imp in the reproducible example), not to individual sites.

MaximilianPi commented 2 years ago

Hi @WevertonBio,

the importance(....) can calculate the VP only for the species, not the sites. I made a mistake in #73.