Adding spatial component reduces R2 of model

TheoreticalEcology / s-jSDM

Scalable joint species distribution modeling

https://cran.r-project.org/web/packages/sjSDM/index.html

GNU General Public License v3.0

68 stars 14 forks source link

Adding spatial component reduces R2 of model #109

Closed AndrewCSlater closed 1 year ago

AndrewCSlater commented 2 years ago

Hi there,

I ran a couple of models using the same environmental and survey data, and included spatial coordinates in one but not the other. The model without the coordinates had a higher R2. The anova seems to imply that the Spatial element actively makes the model worse, but I don't understand if or how that is true?

Could you please have a look and see if there is anything I have done obviously wrong, or have misunderstood something? Many thanks!

Code and output text file is attached. output.txt

MaximilianPi commented 2 years ago

Hi @AndrewCSlater,

you should also scale the XY coords (you scaled env) -> which may explain why the models get worse with the spatial component
if you use a linear object to model space, you should use a trend surface model ->~0+X+Y:X:Y+I(X^2)+I(Y^2) because the interaction alone is often not flexible enough

Btw, 1000 iterations is a small overkill, the model should converge within 100-200 iterations and if not you should increase/decrease the learning_rate

AndrewCSlater commented 2 years ago

Hi Max,

Brilliant, thank you! So while I can't claim to understand the maths behind it, scaling the coordinates made the R2 increase, and then using the code you gave for the linear spatial component improves the model R2 even further.

Re Iterations - I ran 1000 thinking it was quite small as I previously ran the hmsc package where we ran many thousand iterations as burn in and many thousands more once we started recording them, and thinned them to sample every nth one, to end up with multiple chains of several hundred long. Does sjSDM do something similar but more automated, or use different terminology? I'm happy with fewer iterations as it's much faster!

MaximilianPi commented 2 years ago

Hi Andrew,

Scaling: it's not about the math, it's about how we optimize the model. Stochastic gradient descent along with multivariate probit MC approximation is more difficult when some of the variables are on a completely different scale

Iterations: sjSDM is a point estimator based on maximum likelihood estimation (MLE), and the iteration argument corresponds to the number of optimization steps to obtain our point estimates. Hmsc is based on Bayesian inference, which means that Hmsc infers the distributions of all model parameters and does so using MCMC sampling, here, the number of iterations is equal to the number of samples to approximate the distributions of the parameters. MLE is generally faster than Bayesian inference with MCMC sampling.

So iteration in sjSDM is not comparable to iteration in Hmsc, the sjSDM-MLE should converge in 100-200 steps