Influential data points, leverage, cook's distance

melina-leite commented 2 months ago

Ideas to develop about how to measure the influence of data points in DHARMa, something like Cook's distance, in a simpler and more general way.

Some references:

Nieuwenhuis, R., Grotenhuis, M. te, & Pelzer, B. (2012). Influence.ME: Tools for detecting influential data in mixed effects models. R Journal, 4(2), 38–47.
Nobre, J. S., & Singer, J. M. (2011). Leverage analysis for linear mixed models. Journal of Applied Statistics, 38(5), 1063–1072. https://doi.org/10.1080/02664761003759016
Pinho, L. G. B., Nobre, J. S., & Singer, J. M. (2015). Cook’s distance for generalized linear mixed models. Computational Statistics & Data Analysis, 82, 126–136. https://doi.org/10.1016/j.csda.2014.08.008

florianhartig commented 1 month ago

some code to play around with

testData = createData(sampleSize = 100, family = gaussian(), fixedEffects = 1,
                      randomEffectVariance = 0, temporalAutocorrelation = 10)

fittedModel <- lm(observedResponse ~ Environment1, data = testData)
res = simulateResiduals(fittedModel, n = 1000)

plot(res)

resid = residuals(res, quantileFunction = qnorm, outlierValues = c(-5,5))

plot(resid~res$fittedPredictedResponse)

par(mfrow=c(2,2))

plot(fittedModel)

qqnorm(resid)

florianhartig commented 1 month ago

Check also

melina-leite commented 1 month ago

https://github.com/florianhartig/DHARMa/issues/171

florianhartig / DHARMa

Influential data points, leverage, cook's distance #428