florianhartig / DHARMa

Diagnostics for HierArchical Regession Models
http://florianhartig.github.io/DHARMa/
210 stars 22 forks source link

Implement parametric dispersion tests? #242

Open florianhartig opened 3 years ago

florianhartig commented 3 years ago

Connected to our discussion around #201 and the benchmarks that we are currently doing, I am wondering if it would be worth re-implementing the ideas for other (parametric) dispersion tests that are around in DHARMa.

The background is that the existing packages are mostly fixed to particular packages, although the test idea could probably be extended to other packages as well.

Seems to me that the basic test ideas around are:

I guess the first step would be to really understand how many DIFFERENT ideas for dispersion test exist.

florianhartig commented 3 years ago

Started this in https://github.com/florianhartig/DHARMa/tree/overdispersion, see examples in https://github.com/florianhartig/DHARMa/blob/overdispersion/Code/DHARMaIssues/201.R

florianhartig commented 3 years ago

@Istalan, one question, as you had looked at the Pearson test in the glmmWiki: I had thought that if the Pearson residuals are supposed to be chi2 distributed, I could also test for underdispersion, or two-sided. I tried this in https://github.com/florianhartig/DHARMa/blob/overdispersion/Code/DHARMaIssues/201.R but the left-sided test was nearly always significant under H0.

Did you observed this before? Any idea why?

I guess this is the reason why all the package implementations test only upper tail?

Istalan commented 3 years ago

Fascinating, the degrees of freedom in the performance package don't account for the number of groups. No matter if you are using 2, 10, or 1000 groups the df are always n-3 (4997), even though the variance of the conditional residual must decrease with more random coefficients (I'd assume. Certainly in LMM). The results still seem to be chi-square-distributed, but the df is way lower. I haven't found a strict pattern. It is certainly more reduction than just n - 3 - number of groups. Also the qqplot is a little steep. Maybe there is something there, but i don't know.

Here are the simulations i did, but just checking your code with number of groups 5, 100, 500 and 1000 will show the issue. This was for checking the distribution and looking for a pattern.

require(DHARMa)
require(lme4)
require(performance)

`%dopar%` <- foreach::`%dopar%`

my_simulate <- function(FUN, n_sim){
  cores <- parallel::detectCores() - 1
  message("parallel, set cores automatically to ", cores)
  cl <- parallel::makeCluster(cores)

  doParallel::registerDoParallel(cl)
  res <- foreach::foreach(i=1:n_sim, .packages=c("lme4", "DHARMa", "performance"), .combine = rbind) %dopar% 
    FUN()
  parallel::stopCluster(cl)
  return(res)
}

test_fun <-  function(){
  testData = createData(sampleSize = 5000, fixedEffects = 1, family = poisson(), randomEffectVariance = 1, 
                        overdispersion = 0, numGroups = 1000)

  # fittedModel <- glm(observedResponse ~ Environment1, family = "poisson", data = testData)
  fittedModel <- glmer(observedResponse ~ Environment1 + (1|group), family = "poisson", data = testData)
  as.numeric(check_overdispersion(fittedModel))
}

test_res <- my_simulate(test_fun, 2000)
test_res
t.test(test_res[, 1], mu = test_res[1, 3])

qqplot(x = qchisq(seq(10^(-5), 1-10^(-5), length.out = 1000), df = test_res[1, 3]), 
       y = test_res[, 1],
       main = "Test against GLMM df"
       )
abline(0, 1)

qqplot(x = qchisq(seq(10^(-5), 1-10^(-5), length.out = 1000), df = mean(test_res[, 1])), 
       y = test_res[, 1],
       main = "Test against observed df"
)
abline(0, 1)

test_fun <-  function(){
  testData = createData(sampleSize = 5000, fixedEffects = 1, family = poisson(), randomEffectVariance = 1, 
                        overdispersion = 0, numGroups = 10)

  # fittedModel <- glm(observedResponse ~ Environment1, family = "poisson", data = testData)
  fittedModel <- glmer(observedResponse ~ Environment1 + (1|group), family = "poisson", data = testData)
  as.numeric(check_overdispersion(fittedModel))
}

test_res <- my_simulate(test_fun, 2000)
#test_res
t.test(test_res[, 1], mu = test_res[1, 3])

qqplot(x = qchisq(seq(10^(-4), 1-10^(-4), length.out = 1000), df = test_res[1, 3]), 
       y = test_res[, 1],
       main = "Test against GLMM df"
)
abline(0, 1)

qqplot(x = qchisq(seq(10^(-4), 1-10^(-4), length.out = 1000), df = mean(test_res[, 1])), 
       y = test_res[, 1],
       main = "Test against observed df"
)
abline(0, 1)

P.S. yes i do have a new CPU. It's quite powerful and i'm having way too much fun with it :)

florianhartig commented 3 years ago

Hi @Istalan ... sorry, I just saw that you posted this, if I had read this before our discussion today would have been much better informed on my side :p

Yes, the df is quite well known, but to be honest I hadn't realised that df.residuals counts 1df for the RE only, and what effects this has on the test.

I just had a look but didn't find any useful implementations of df in R that would apply across a range of models, although https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-015-0026-x suggests that e.g. Satterthwaite is still useful, although they recommend the betwee-within method.

Again, as we discussed, I would much prefer something that can be extracted for all model classes, or ideally only from simulations, rather than a specific solution

florianhartig commented 3 years ago

Just to add on this - I wonder why all implementations of Satterthwaite in R are restricted to LMMs, when the approximation also holds for GLMMs (as the article suggests)

florianhartig commented 3 years ago

OK, but on the practical side, this is implemented, so the only thing to be checked is that this is identical to existing tests in performance etc.