florianhartig / DHARMa

Diagnostics for HierArchical Regession Models
http://florianhartig.github.io/DHARMa/
201 stars 21 forks source link

Error in simulateResiduals when binomial outcome coded as factor in gamm4 model fit #334

Closed jgrembi closed 2 years ago

jgrembi commented 2 years ago

Hi, thanks for the great package! I've been using it to run diagnostics on gam models run using the gamm4 package. I'm running binomial models with a binary outcome (whether a child has diarrhea or not) based on a set of risk factors (age, household wealth, temperature, rainfall, etc). If I code the outcome as a numeric variable, simulateResiduals works as I expect. If I code it as a factor (what I understand to be best practice when running models in R for binary variables), I get a repeated warning [‘*’ not meaningful for factors] and then finally an error [Error in securityAssertion("Simulations from your fitted model produce NA values. DHARMa cannot calculated residuals for this. This is nearly certainly an error of the regression package you are using"].

The gamm4 model output contains both a $gam and $mer fit, the latter coming from lme4, and what I'm using as the object provided to simulateResiduals. What I noticed is that gamm4 output always contains a "(weights)" column in the model.frame of all 1s if no weights are provided, regardless if the y is a factor or numeric. I think the error I'm getting is occurring when the out object obtained from out = simulate(object) [which is factor 0/1 output] is being multiplied by the model.frame$(weights) in line 244 of getSimulations.default and returning a NA value. So, I'm wondering if the gamm4 output is an edge case that provides weights for a factor y when that is only provided by other model fitting packages for numeric y. If I am correct, I think the error I'm getting can be fixed by switching lines 248-255 to just before line 242 in your file DHARMa/R/compatibility.R

Or maybe I'm missing something else really important since I'm not very familiar with the model output from different packages besides gamm4!

florianhartig commented 2 years ago

Hello Jess, thanks for reporting this.

I'm surprised that this works at all - gamm4 is not officially supported, and I didn't test it so far. When I try a simple example

library(gamm4)
library(DHARMa)

dat <- createData(factorResponse = T, family = binomial())

fit <- gamm4(observedResponse ~ Environment1, random= ~ (1|group), 
                            family = binomial, data = dat)
res <- simulateResiduals(fit, plot = T)

I get an error (Error in UseMethod("family") : no applicable method for 'family' applied to an object of class "list"). Could I ask what function call you are using, so that the DHARMa simulations work?

In principle, I'm happy to consider adding gamm4 support to DHARMa, but it will require a bit of testing / coding. After a very superficial glance at the objects, I see that the gamm4 object is a mix of lme4 and mgcv objects. It seems the simulate function is implemented for the mgcv::gam object, but I'm not sure if it includes the REs when simulating.

Assuming that the 1/0 simulations work fine for you (which I can't guarantee, see above), you can of course code 0/1 instead of factor. There is no fundamental advantage of the factor coding, other than 1/0 harder to read in outputs / plots.

jgrembi commented 2 years ago

Hi Florian, Thanks for your response. Instead of providing simulateResiduals(fit, plot = T), I'm providing just the $mer portion of the gamm4 fit (which includes the REs). So the code would be simulateResiduals(fit$mer, plot = T) I'd prefer to keep the outcome as a factor variable for internal consistency with how we run other models in our research group. So I tried to test the change I suggested by making a new getSimulations.defaultfunction locally (in my global environment), but I think because it's an internal function within your package, simulateResiduals doesn't seem to be looking in my global environment and seeing the updated function that I created. Do you know if there is a way to force this in R for package internal functions?

Best, Jess

florianhartig commented 2 years ago

Hello Jess,

OK, thanks, now it makes sense! You were right, the problem was that gamm4 is writing a weight column in the model frame, which confused my wrappers for lme4. I have just pushed a fix that should solve the error message (i.e. should now be able to calculate the residuals). If you install the DHARMa from GitHub (see https://github.com/florianhartig/DHARMa), it should work.

I have made a few superficially tests and all seemed fine - assuming that gamm4 doesn't do anything weird and the object behaves like a normal lme4 model, it should be safe to use this.

Cheers Florian

jgrembi commented 2 years ago

Hi Florian, I checked and the new version works great:) Thanks a lot! Jess