florianhartig / DHARMa

Diagnostics for HierArchical Regession Models
http://florianhartig.github.io/DHARMa/
201 stars 21 forks source link

Problem with plot against the predictor #338

Closed Olivia-Vjorn closed 2 years ago

Olivia-Vjorn commented 2 years ago

Hello,

I am trying to run the code described in the section "Detecting missing predictors or wrong functional assumptions" from (https://cran.r-project.org/web/packages/DHARMa/vignettes/DHARMa.html#detecting-missing-predictors-or-wrong-functional-assumptions).

I am using lmer() to run a model which has a Time x Study Condition interaction with a random slope for time and random intercept for participant ID. Time is numeric 0-3, study condition is a factor with 2 levels, and participant ID is a character. There are NAs in the outcome but none for the predictors (long format).

I tried using the following code: lmer.model = lmer(QOLmental ~ Time*StudyCondition + (1 + Time | participant_id), data = data) simulationOutput = simulateResiduals(lmer.model) par(mfrow = c(1,2)) plotResiduals(simulationOutput, data$Time) plotResiduals(simulationOutput, data$StudyCondition)

But I got this error: Error in ensurePredictor(simulationOutput, form) : DHARMa: residuals and predictor do not have the same length. The issue is possibly that you have NAs in your predictor that were removed during the model fit. Remove the NA values from your predictor.

I double checked that there are no NA values in either Time nor study condition.

Is this a bug? I am using DHARMa version 0.4.5 and lme4 version 1.1-29.

Please let me know what other information I can give to help.

Thanks,

Olivia

florianhartig commented 2 years ago

Hello Olivia,

could you send me the data for this, ideally directly here, or else (if confidential) via email?

Cheers, Florian

Olivia-Vjorn commented 2 years ago

Hello,

I was able to recreate the error using the following (my dataset has more missing values but just adding one worked):

`library(dplyr) library(lme4) library(DHARMa)

set.seed(1) ID = rep(1:340,each=4) Time = rep(0:3,340) Arm = rep(c(0,0,0,0,1,1,1,1), 170) QOL = rnorm(1360, mean = 50, sd = 10) data = data.frame(ID, Time, Arm, QOL) data[c(4), 4] = NA

model = lmer(QOL ~ Time*Arm + (1 + Time | ID), data = data) simulationOutput = simulateResiduals(model) par(mfrow = c(1,2)) plotResiduals(simulationOutput, data$Time) plotResiduals(simulationOutput, data$Arm)`

Are you able to reproduce the error from this?

Thanks, Olivia

florianhartig commented 2 years ago

yes, I can reproduce this - the issue is that you have an NA in your data - remember that all NA rows are discarded in the lm, so you have to provide the same data in plotResiduals that is used in the model. The easiest ways to do this would be

  1. do a complete.cases() before fitting the regression, to remove the NA lines from your data
  2. you can also extras the data that was used to fit them model via model.frame()
x = model.frame(model)
plotResiduals(simulationOutput, x$Time)

I know this is all not sure convenient, and it's on my list to provide a bit more help on this for the user, but at the moment this is how it is.

Best, Florian

Olivia-Vjorn commented 2 years ago

Ah, thank you for that explanation and the work around to use model.frame!

florianhartig commented 2 years ago

Glad this was helpful. I will close the issue!