florianhartig / DHARMa

Diagnostics for HierArchical Regession Models
http://florianhartig.github.io/DHARMa/
202 stars 22 forks source link

DHARMa residual checks with with GLMMadaptive using ‘hurdle semi-continous’ or ‘hurdle log-normal’ #423

Open florianhartig opened 1 month ago

florianhartig commented 1 month ago

For my study I model the response variable using a two-part mixed model approach, using the GLMMadaptive package in R. The family function there is called a ‘hurdle semi-continous’ or ‘hurdle log-normal’ function. On the vignette-page of the GLMMadaptive package, a wrapper function was provided to have the DHARMa functionality to be used with MixMod objects. I modified this function a little to work with NA’s in the dataset. The function is:

resids_plot2 <- function(object, y, nsim = 1000,

                         type = c("subject_specific", "mean_subject"),

                         integerResponse = NULL) {

  if (!inherits(object, "MixMod"))

    stop("this function works for 'MixMod' objects.\n")

  type <- match.arg(type)

  if (is.null(integerResponse)) {

    integer_families <- c("binomial", "poisson", "negative binomial",

                          "zero-inflated poisson", "zero-inflated negative binomial",

                          "hurdle poisson", "hurdle negative binomial")

    numeric_families <- c("hurdle log-normal", "beta", "hurdle beta", "Gamma")

    if (object$family$family %in% integer_families) {

      integerResponse <- TRUE

    } else if (object$family$family %in% numeric_families) {

      integerResponse <- FALSE

    } else {

      stop("non build-in family object; you need to specify the 'integerResponse',\n",

           "\targument indicating whether the outcome variable is integer or not.\n")

    }

  }

  # Select only subjects that were not removed from model (i.e., non-NA).

  subjects <- sort(as.numeric(rownames(model.frame(object))))

  y <- y[subjects]

  # Simulate

  sims <- simulate(object, nsim = nsim, type = type)

  fits <- fitted(object, type = type)

  dharmaRes <- DHARMa::createDHARMa(simulatedResponse = sims, observedResponse = y,

                                    fittedPredictedResponse = fits,

                                    integerResponse = integerResponse)

  DHARMa:::plot.DHARMa(dharmaRes, quantreg = FALSE)

}

In my case, the function returns a plot like below:

image

However, when I try to plot the residuals with the functions like you described in the DHARMa vignette, I can obtain the following plots

plot(simulateResiduals(tp0_hl, n = 1000)) #setting N to 1000 to match the nsim from the wrapper function

This shows a slightly better QQ-plot.

image

However, when testing the model with a zero-part random effects model, the differences increase even further:

Using the wrapper function:

image

Using the plot(simulateResiduals(model, nsim = 1000)) function:

image

The latter shows a much better QQ-plot than the wrapper function does, and one could deem the second plot to show a relatively nice fit (apart from the KS test) compared to the first plot.

I am not able to decipher where this difference comes from and which plot-method would be better to ‘trust’. I believe this has something to do with the createDHARMa function.

Also, I read on your github that you have been working together with the creator of the GLMMadaptive package, so, perhaps, the wrapper function is absolete now? Could you help me to point out which model-plot function would be best to use, working with MixMod objects and what leads to the differences of the plots?

florianhartig commented 1 month ago

Hi,

thanks for bringing this to my attention! I suspect the differences come from different defaults for type = c("subject_specific", "mean_subject") in DHARMa and the glmmAdaptive function.

I will have to check in more detail which settings exactly make more sense. My initial reaction is that subject-specific is likely fine for simulations, but extractions of predictions as

fits <- fitted(object, type = type)

in the glmmAdaptive function may be a problem for the reasons discussed in #43.