florianhartig / DHARMa

Diagnostics for HierArchical Regession Models
http://florianhartig.github.io/DHARMa/
201 stars 21 forks source link

Identify outliers using DHARMa testOutliers function #354

Open javimorente opened 1 year ago

javimorente commented 1 year ago

Hi, I am pretty new to DHARMa and GLMM model diagnosis using this new framework. First of all, I want to thank you for the development of the package and especially for all the information that you have provided that makes really easy to use the package. I hope you don't mind asking some questions that appeared during model diagnosis of a kind of complicated database that I'm dealing with. Models have a lot of problems (KS, oversidpersion and outliers test are significant) and I'm trying to deal with all this. I'm modeling the continuous proportion of native trees with regard to the total number of native species on a pixel inside an island as follows (best model so far):

glmm.perc.phane.native.bin.b<-glmmTMB(cbind(number.phane.native,number.native-number.phane.native) ~ BIO1+BIO2+ASPECT+ GEOLOGY+LandUse+(1|island), family=binomial, zi=~1, na.action = "na.fail", data=DB)

1) With the objective of cleaning outliers: Is any option to clean the outliers that the testOutliers function identifies? Is using a pre-modeling-outliers-cleaning approach in the database the best option? 2) Despite the significant overdispersion of my data, a betabinomial distribution substantially deteriorates the model fitting so I'm using a binomial family. Is this right?

Best regards and thank you in advance. Javi.

florianhartig commented 1 year ago

Hello Javi,

regarding 1): to get outliers, you can use the outliers() function, as in

library(lme4)
testData = createData(sampleSize = 100, overdispersion = 2, family = poisson())
fittedModel <- glmer(observedResponse ~ Environment1 + (1|group), 
                     family = "poisson", data = testData)

simulationOutput <- simulateResiduals(fittedModel = fittedModel)
outliers(simulationOutput)

However, whether outliers should be removed depends highly on context, and also, note that DHARMa outliers depend on the number of simulations n . You should read the help of outliers() for more info!

regarding 2) if the beta-binomial doesn't fit, you can consider an observation-level RE, as described in https://peerj.com/articles/616/

Best, Florian

javimorente commented 1 year ago

Thank you very much Florian. That was useful for me. I will work in that direction.