florianhartig / DHARMa

Diagnostics for HierArchical Regession Models
http://florianhartig.github.io/DHARMa/
210 stars 22 forks source link

DHARMa::dispersionTest - boundary (singular) fit: see ?isSingular #178

Closed florianhartig closed 1 week ago

florianhartig commented 4 years ago

question from a user

Model 1
overPS <- glmer(Total ~ Treatment + (1 | year/month) + (1 | field),data=mydata, family = poisson(link = "log"))

boundary (singular) fit: see ?isSingular
summary(overPS)  #AIC = 1076.2  lowest AIC,
library(DHARMa)
simulateResiduals(overPS, refit = T)

boundary (singular) fit: see ?isSingular

boundary (singular) fit: see ?isSingular

boundary (singular) fit: see ?isSingular

And then it just keeps on showing this error.

Model 2
If I replace the Year/month by sampling.date it seems to work
overPS <- glmer(Total ~ Treatment + (1 | Sampling.Date) + (1 | field),data=mydata, family = poisson(link = "log"))
summary(overPS) #AIC = 1076.2  lowest AIC,
sim_PS <- simulateResiduals(overPS, refit=T)

boundary (singular) fit: see ?isSingular

Warning message:

In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :

  Model failed to converge with max|grad| = 0.00238227 (tol = 0.002, component 1)
testdispersion(sim_PS)

data:  sim_PS

dispersion = 5.1988, p-value < 2.2e-16

alternative hypothesis: two.sided

So I wonder if you happen to know why model 1 does not work and if you can tell me if I can ignore the error @2nd model?
florianhartig commented 4 years ago

In the test with refit = T, new data is created from the fitted model (e.g. 100 new datasets), and then the model is refit for each of these new datasets. In the course of that, it often happens that the model (in this case glmer) does not converge, so this is not unexpected.

This could potentially be improved by changing the optimiser etc., but to be honest, I don't think it's worth the effort. I was wondering why you do it, and checking the help, I realised that there was still an old remark in simulateResiduals that pointed to using refit for dispersion tests. This is outdated, and I have removed this text now (in the development version). The DHARMa vignette gives the appropriate advise:

  • if refit = F (default), new datasets are simulated from the fitted model, and residuals are calculated by comparing the observed data to the new data

  • if refit = T, a parametric bootstrap is performed, meaning that the model is refit to all new datasets, and residuals are created by comparing observed residuals against refitted residuals

The second option is much much slower, and also seemed to have lower power in some tests I ran. It is therefore not recommended for standard residual diagnostics! I only recommend using it if you know what you are doing, and have particular reasons, for example if you estimate that the tested model is biased. A bias could, for example, arise in small data situations, or when estimating models with shrinkage estimators that include a purposeful bias, such as ridge/lasso, random effects or the splines in GAMs. My idea was then that simulated data would not fit to the observations, but that residuals for model fits on simulated data would have the same patterns/bias than model fits on the observed data.

Note also that refit = T can sometimes run into numerical problems, if the fitted model does not converge on the newly simulated data.

Note also the following comment in testDispersion

The results of the dispersion test can can differ depending on whether it is evaluated on conditional (= conditional on fitted random effects) or unconditional (= REs are re-simulated) simulations. You can change between conditional or unconditional simulations in simulateResiduals if this is supported by the regression package that you use. The default in DHARMa is to use unconditional simulations, but I have often found that conditional simulations are more sensitive to dispersion problems. I recommend trying both, as neither test should be positive if the dispersion is correct.