testDispersion() default fails to detect overdispersion in a Poisson GLMM

florianhartig / DHARMa

Diagnostics for HierArchical Regession Models

http://florianhartig.github.io/DHARMa/

200 stars 21 forks source link

testDispersion() default fails to detect overdispersion in a Poisson GLMM #402

Open HugoMH opened 5 months ago

HugoMH commented 5 months ago

DHARMa::testDispersion() doesn't detect overdispersion in an overdispersed poisson GLMM.

Here is a MWE :

TooHot = read.csv('https://raw.githubusercontent.com/HugoMH/Stat_M1_BootGLMM/main/TDs/Data/Suicides%20and%20Ambient%20Temperature.csv')
head(TooHot)

TooHot$Temperature2 = TooHot$Temperature^2

Mpoisson = glm(Suicides ~ Temperature + Temperature2 + Country
            ,data = TooHot
            ,family = poisson(link = 'log')) # !!   <°)))><    !!

MpoissonRE = lme4::glmer(Suicides ~ Temperature + Temperature2 + (1|Country) 
              ,data = TooHot, family = poisson(link = 'log'))

DHARMa::testDispersion(Mpoisson,plot = F)
# dispersion = 11350, p-value < 2.2e-16

DHARMa::testDispersion(MpoissonRE,plot = F) # /!\ PROBLEM
# dispersion = 0.7349, p-value = 0.392

### The same
DHARMa::testDispersion(DHARMa::simulateResiduals(MpoissonRE),plot = F) # /!\ PROBLEM
# dispersion = 0.7349, p-value = 0.392

### The same
DHARMa::testDispersion(DHARMa::simulateResiduals(MpoissonRE,refit = F),type = 'DHARMa'      ,plot = F) # /!\ PROBLEM
# dispersion = 0.7349, p-value = 0.392

DHARMa::testDispersion(DHARMa::simulateResiduals(MpoissonRE,refit = F),type = 'PearsonChisq',plot = F) # Ok
# dispersion = 2650.7, df = 337, p-value < 2.2e-16

DHARMa::testDispersion(DHARMa::simulateResiduals(MpoissonRE,refit = T),type = 'DHARMa'      ,plot = F) # Ok
# dispersion = 2725.4, p-value < 2.2e-16

DHARMa::testDispersion(DHARMa::simulateResiduals(MpoissonRE,refit = T),type = 'PearsonChisq',plot = F) # Ok
# dispersion = 2650.7, df = 337, p-value < 2.2e-16

florianhartig commented 5 months ago

Hello Hugo,

just as a general point: we don't expect the GLM to have the same dispersion as the GLMM (because the RE absorbs dispersion), so unless you have simulated this data, we wouldn't know for sure by fitting the GLM that the GLMM should also show overdispersion. In this case, however, it is clear that default dispersion tests fails to flag the overdispersion that exists.

The reason is that the dispersion tests based on unconditional simulations can be a far less powerful than the test on conditional simulations (basically the issue increases the stronger the random effect), and you should therefore prefer to test with conditional residuals in case of strong REs. In the help, I therefore have the warning

Important: for either refit = T or F, the results of type = "DHARMa" dispersion test will differ depending on whether simulations are done conditional (= conditional on fitted random effects) or unconditional (= REs are re-simulated). How to change between conditional or unconditional simulations is discussed in simulateResiduals. The general default in DHARMa is to use unconditional simulations, because this has advantages in other situations, but dispersion tests for models with strong REs specifically may increase substantially in power / sensitivity when switching to conditional simulations. I therefore recommend checking dispersion with conditional simulations if supported by the used regression package.

The way to do this in lme4 would be to run

res2 <- simulateResiduals(MpoissonRE, re.form = NULL)
DHARMa::testDispersion(res2)

which would give you the result you expect. Alternatively, you can also run the analytical test

DHARMa::testDispersion(MpoissonRE, type = "PearsonChisq")

but this can have certain other problems in case of a large number of RE groups.

None of this requires refit and I also wouldn't recommend to use refit in this case for reasons of speed.

I appreciate that this appraoch is not 100% intuitive and the user shouldn't have to read the help in detail before using a function to get reasonable results. I have thought about switching the way residuals are simulated in the background for the dispersion tests, but then the dispersion tests would have another default than the rest of the package and the conditional residuals are not available for all R packages (in particular, glmmTMB doesn't support them), so if I switch automatically the function would perform differently for different R packages.

HugoMH commented 5 months ago

Hi,

Thank you for your answer,

The main problem I think is that when you run

DHARMa::testDispersion(MpoissonRE)

The function testDispersion internally calls simulateResiduals(MpoissonRE) while it should call testDispersion(simulateResiduals(MpoissonRE, re.form = NULL , refit = T)) for GLMMs. $~~~~~~~~~~~~~~~~~~~~~~~~$ ^^^^^^^^ (see below)

If this is not possible as for glmmTMB, then testDispersion should return the classical PearsonChisq dispersion, with warning about the change of the default in comparison to other packages.

However, I still think that re.form = NULL is not the solution :

DHARMa::testDispersion(DHARMa::simulateResiduals(MpoissonRE, re.form = NULL           ), type = 'DHARMa')
# dispersion = 1.8607, p-value < 2.2e-16

DHARMa::testDispersion(DHARMa::simulateResiduals(MpoissonRE, re.form = NULL           ), type = 'PearsonChisq')
# dispersion = 2650.7, df = 337, p-value < 2.2e-16

DHARMa::testDispersion(DHARMa::simulateResiduals(MpoissonRE, re.form = NULL, refit = T), type = 'DHARMa')
# dispersion = 2731.3, p-value < 2.2e-16

For the description of the argument refit, I don't fully understand the sentence "if FALSE, new data will be simulated and scaled residuals will be created by comparing observed data with new data."

Overdispersion has to be detected by comparing the observed residual variance to the assumed variance. The assumed variance cannot be estimated from the fitted model, it has to use the parametric bootstrap, even though it will be somewhat slower. To faster computation, we can lower the number to simulations, for example

DHARMa::testDispersion(DHARMa::simulateResiduals(MpoissonRE, re.form = NULL, refit = T, n = 10), type = 'DHARMa')

or we can get some rest and be patients 🙂

Cheers, Thanks anyway for developing this package.

hugo

florianhartig commented 5 months ago

Hi Hugo,

yes, testDispersion simulates the residuals via the default, but if you want to modify this, you can do this as in

res2 <- simulateResiduals(MpoissonRE, re.form = NULL)
DHARMa::testDispersion(res2)

which I would recommend anyway. Do you think that this is inconvenient or confusing?

If I understand you correctly, your second problem is that the two tests give different dispersion parameters? Differences are not necessarily unexpected, as they are based on different test statistics. Maybe the help is not entire clear in DHARMa, but there are basically 2 fundamentally different options in DHARMa:

1) Default: comparison of SD of simulated / observed residuals

2) Comparison of Pearson Chi-2. which can be either done via df analytically (option PearsonChisq) or via refit = T. The advantage of refit is that the Pearson residuals will not be exactly Chi2 under a Mixed model, which biases the test to underdispersion.

So, in that sense it is no surprise that your option 2,3 are very similar, and 1 is different. Of 2,3, you would think 3 is more exact.

Whether 1 is a problem is debatable to me. In our tests (I had a MSc thesis about this recently), the default tests with re.form = NULL have excellent type I error rates and power. In fact, there doesn't seem to be an advantage of the PearsonChisq. The only issue may be (we haven't looked at this in detail) that the test statistic is not really proportional to what you expect as a dispersion parameter. If you plot the residuals, it looks indeed wider dispersed than 1.8. So, is the 1.8 the problem you see, you would like the value to be higher?

HugoMH commented 5 months ago

Yes, I think that recommending to not use the default value of a package you develop is problematic and confusing. Most people will not read the recommendations, and many will draw wrong conclusions.

In addition, as mentioned in 'Overdispersion and Diagnostics (Stat 544, Lecture 11)' https://app.box.com/s/ejcig9z4shtt3wodq6nfb2ojignzps11

"A test for φ = 1? There is no reliable test for φ = 1. A test is not important anyway, because what really matters is how far away φ is from one, not how much evidence you can accumulate against the null hypothesis H0 : φ = 1."

(For a similar discussion about normality in linear models, see https://stats.stackexchange.com/questions/2492/is-normality-testing-essentially-useless)

So the effect size of the deviation from the assumption of a model is more useful than a p-value.

You say that the first approach, is accurate. Maybe that is for the p-value, I don't know, but that is clearly not for the effect size, and some people use only effect sizes in this context.

The first approach yield an estimate that is thousands of times lower than other approaches.

Some consider that in models with φ<2, overdispersion is negligible. This threshold is way too conservative. But people using it would interpret the p-values of this heavily overdispersed model.

Cheers and good-luck with future