florianhartig / DHARMa

Diagnostics for HierArchical Regession Models
http://florianhartig.github.io/DHARMa/
214 stars 22 forks source link

Residuals for Down-weighted Poisson Regression (DWPR) in DHARMa #214

Open florianhartig opened 4 years ago

florianhartig commented 4 years ago

Question from a user is if DWPR, as described in Renner, Ian W., et al. "Point process models for presence‐only analysis." Methods in Ecology and Evolution 6.4 (2015): 366-379, can be used with DHARMa.

This is an example plot from a DWPR model, fitted with MGCV. The mgcv throws the warning what weights are ignored, which is to be expected. I suspect that the pattern originates from the pseudo-absences, which are set to have low weight during the fit, but in general the model will predict a non-random pattern for those, which will not be matched by the data (which is alway zero)

image

Idea: calculate DHARMa residuals only for presences?

florianhartig commented 4 years ago

OK, I have looked into this. The point for a DWPR regression is that residuals per data point don't make sense, because the weights are necessarily ignored in the simulations (the weighted Poisson does not have an explicit data-generating function), and thus any attempts to build weights on single 0/1 observations are doomed.

What works perfectly fine, however, is to aggregate 0/1 observations and predictions to a grid, create Poisson simulations by hand from that, and read the results back into DHARMa (see example below). If you do this on real data, note that:

# Definining incidence on a grid according to Poisson
gridData = expand.grid(x = 1:50, y = 1:50)
gridData$env = gridData$x/50 * gridData$y/50
gridData$incidence = exp(1 + 0.5 * gridData$env)
gridData$counts = rpois(nrow(gridData ), gridData$incidence)

# Creating point observations
pointData <- gridData[rep(row.names(gridData), gridData$counts), 1:3]
pointData$pres = 1

# Add pseudo-absencens randomly

nPseudo = 10000
pseudo = data.frame(x = sample.int(50, size = nPseudo, replace = T), y = sample.int(50, size = nPseudo, replace = T))
pseudo$env = pseudo$x/50 * pseudo$y/50
pseudo$pres = 0
pointData = rbind(pointData, pseudo)

# creating weights
pointData$weight = rep(1.e-6, nrow(pointData))
pointData$weight[pointData$pres == 0] = nrow(gridData) / sum(pointData$pres == 0)

# fitting the model
fit <- glm(pres/weight ~ env, data = pointData, family = poisson, weights = weight)
summary(fit)

# creating predictions and simulations from the predictions, and read in as DHARMa object
gridData$pred = predict(fit, newdata = gridData, type = "response")
sim = replicate(500, rpois(length(gridData$pred), gridData$pred))

library(DHARMa)
res <- createDHARMa(sim, gridData$counts, fittedPredictedResponse = gridData$pred, integerResponse = T)

# plotting residuals
plot(res)
testSpatialAutocorrelation(res, x = gridData$x, y = gridData$y)

Result:

image

image