lack of convergence in tMsPGOcc example data when n.chains > 1

rakatz42 commented 2 months ago

Hey Jeff -

I'm trying to build a tMsPGOcc model and having issues with understanding what level of data are required for model convergence. Do you know why r-hat is occasionally > 1.1 for some detection parameters and often > 1.1 for occupancy parameters using the example data for tMsPGOcc and nchains > 1? Can this example be updated with an example showing sufficient mixing and convergence? We do have some detection data from previous efforts and I'm curious if there are ways to easily add priors for detection by species (instead of having the model draw from a community-level detection parameters)? Could lack of convergence be resolved with more sampling (n.batch = 5, should I be increasing batch length more than n.batch? note r-hat often <1.01 when I change batch.length from 25 to 250 - was there just not enough samples?) or is there not enough data to estimate parameters well in this example? Also, is there also code to assess model fit for tMsPGOcc?

Thank you for developing this amazing resource!

doserjef commented 2 months ago

Hi Rachel,

Thanks for the note and apologies for the confusion! When putting packages on CRAN, they have time limits for how long they want the examples to run so that the package can be built quickly. As a result of that, a lot of the examples in the documentation (like tMsPGOcc() as you point out) are not converged. I won't be able to change the actual example that is actually run, but I could add in commented code the associated MCMC criteria that can be used to get a model that has fully converged. I agree it's a bit confusing have non-converged examples, so thanks for pointing this out! I'm going to add this to my todo list and will keep this issue open until I add in comments to each documentation example to make time to convergence more clear.

For this specific example (and all the examples in the documentation), it is not converged simply because it is not run long enough. For example, tMsPGOcc() example converges nicely (low Rhat and high ESS) when using the following MCMC criteria:

n.batch <- 400
batch.length <- 25
n.burn <- 5000
n.thin <- 5
n.chains <- 3

I generally recommend keeping batch.length around 25 and then increasing n.batch when convergence hasn't been reached (and then can adjust n.burn and n.thin accordingly). You can also increase batch.length, but that could lead to slower mixing for certain types of parameters (the spatial covariance parameters for spatially-explicit models, or in the case of multi-season models the AR(1) parameters if fitting the model with ar1 = TRUE). This is discussed in a bit more detail here when introducing the single-species spatial occupancy model function.

In regards to the amount of data needed for tMsPGOcc(), the data requirements are pretty similar to a basic single-species occupancy model. The only parameters that can be pretty tough to estimate in tMsPGOcc() models is the AR(1) parameters (if ar1 = TRUE) and there are only a few seasons in the data set (i.e., < 5). However, those parameters can usually be estimated still in such cases, they will just be very uncertain.

For assessing model fit, you can use ppcOcc() to perform posterior predictive checks and waicOcc() to compare different models with WAIC. The way to do that is identical across different spOccupancy model types. So, to do a posterior predictive check and calculate WAIC for the example in the tMsPGOcc() documentation, you could run

# See ppcOcc manual page and intro vignette for more on these arguments
ppc.out <- ppcOcc(out, fit.stat = 'freeman-tukey', group = 1)
summary(out)
# Calculate WAIC
waicOcc(out)

Unfortunately there is not currently a way to specify a different prior for the species-specific detection parameters. That is something that could likely be added in the future, but there's a few other things that are on the todo list for new functionality so I'm not sure when that would be. Sorry about that!

Hope that helps! Let me know if I can clarify anything.

Jeff

doserjef commented 1 week ago

I added in a comment in each documentation example that explicitly states they are just toy examples and run times may need to be longer to fully achieve convergence, and so I am closing this issue.

doserjef / spOccupancy

lack of convergence in tMsPGOcc example data when n.chains > 1 #47