Closed florianhartig closed 1 week ago
In the test with refit = T, new data is created from the fitted model (e.g. 100 new datasets), and then the model is refit for each of these new datasets. In the course of that, it often happens that the model (in this case glmer) does not converge, so this is not unexpected.
This could potentially be improved by changing the optimiser etc., but to be honest, I don't think it's worth the effort. I was wondering why you do it, and checking the help, I realised that there was still an old remark in simulateResiduals that pointed to using refit for dispersion tests. This is outdated, and I have removed this text now (in the development version). The DHARMa vignette gives the appropriate advise:
if refit = F (default), new datasets are simulated from the fitted model, and residuals are calculated by comparing the observed data to the new data
if refit = T, a parametric bootstrap is performed, meaning that the model is refit to all new datasets, and residuals are created by comparing observed residuals against refitted residuals
The second option is much much slower, and also seemed to have lower power in some tests I ran. It is therefore not recommended for standard residual diagnostics! I only recommend using it if you know what you are doing, and have particular reasons, for example if you estimate that the tested model is biased. A bias could, for example, arise in small data situations, or when estimating models with shrinkage estimators that include a purposeful bias, such as ridge/lasso, random effects or the splines in GAMs. My idea was then that simulated data would not fit to the observations, but that residuals for model fits on simulated data would have the same patterns/bias than model fits on the observed data.
Note also that refit = T can sometimes run into numerical problems, if the fitted model does not converge on the newly simulated data.
Note also the following comment in testDispersion
The results of the dispersion test can can differ depending on whether it is evaluated on conditional (= conditional on fitted random effects) or unconditional (= REs are re-simulated) simulations. You can change between conditional or unconditional simulations in simulateResiduals if this is supported by the regression package that you use. The default in DHARMa is to use unconditional simulations, but I have often found that conditional simulations are more sensitive to dispersion problems. I recommend trying both, as neither test should be positive if the dispersion is correct.
question from a user