florianhartig / DHARMa

Diagnostics for HierArchical Regession Models
http://florianhartig.github.io/DHARMa/
200 stars 21 forks source link

Could not find documentation on red-shaded area around smooth spline #411

Open MikeACG opened 1 month ago

MikeACG commented 1 month ago

Hey there! I was wondering what exactly is the red-shaded area that I see in the following plot and why it is so spiky looking:

image

The context is a big Poisson GLM and this is a predictor vs residuals plot for 1 of about 40 predictors. By looking at the code of the plot residuals function I figured out that for large datasets DHARMa switches from quantile regression to a smooth spline to draw the dashed red line. However, I'm unable to figure out what the area is. I see some polygon being drawn in the function but seemingly only when quantile regression is used so I'm a bit confused.

EDIT: I forgot to say that I'm running version 0.4.6 on R 4.3.0. Also, for other (even bigger) datasets the red-shaded area seems to not be drawn at all. First I thought it was just because the area was so small I couldn't see it in the other dataset but it really does appear like its not drawn at all.

florianhartig commented 1 month ago

Hi,

can you send me the complete code that you use to produce these plots? I have never seen this red shaded area and it doesn't occur on my system. Example:

testData = createData(sampleSize = 11000, family = poisson())
fittedModel <- glm(observedResponse ~ Environment1 ,
                     family = "poisson", data = testData)

simulationOutput <- simulateResiduals(fittedModel = fittedModel)
plot(simulationOutput)

Best F

MikeACG commented 1 month ago

Hi Florian thanks for your response.

I have never seen this red shaded area

Wow! Actually this makes a lot of sense because now I'm sure it must be due to a very obscure thing ocurring. The plot is generated inside a big pipeline that fits many regressions for different datasets. The pipeline runs R through Rscript inside a singularity image so the setup is a bit convoluted. I manually ran the script for the dataset involved in the plot inside an interactive R session and I too could not replicate the plot! I realized also that this particular regression is based on glm.nb from the MASS package and not on plain Poisson regression. I am able to replicate it if I run the pipeline itself which is super weird. With both approaches the plot is exactly the same except for the shaded area.

I will try to get some sort of working example to reproduce the plot in a less complicated way but at least I think its safe to say that whatever triggers this is fairly obscure and probably not a huge problem right now.