florianhartig / DHARMa

Diagnostics for HierArchical Regession Models
http://florianhartig.github.io/DHARMa/
200 stars 21 forks source link

Inconsistent results between performance::check_zeroinflation and DHARMa::testZeroInflation for a glmmTMB negative binomial GLMM #392

Closed PalmyreHBoucherie closed 7 months ago

PalmyreHBoucherie commented 8 months ago

Dear Prof. Hartig,

First I'd like to thank you for the excellent tool you developed. DHARMa is indeed of great help for most of my analyses !

If I allow myself to write you today, is because I encountered a puzzling output in DHARMa, regarding zero-inflation. Basically, when dealing with zero-inflated models, I usually use both check_zeroinflation() from the performance package, and testZeroInflation() from the DHARMa package.

However, I currently encountered a recurrent issue on several of my models, with inconsistent results between check_zeroinflation() and testZeroInflation().

This happened with glmmTMB, and Poisson or Negative binomial zeroinflated models with either a single zeroinflation parameter applied to all observations (ziformula~1), or when specifying a zeroinflated model (e.g., ziformula = ~ factor.a + factor.b).

In either cases, when applying the ziformula, DHARMa testZeroInflation() suggests that the issue is gone, while the check_zeroinflation() remains positive for zeroinflation, with a relatively high ratio.

Thanks for your help !

Palmyre

I add here an example:

I model here count data of cofeeding instances between captive ravens. I do not show the zero-inflated Poisson model, since the model seemed to be overdispersed (but note that I ran into the same issue with that model).

Using nbinom1 to deal with the overdispersion:

Cofeeding.nb1_z = glmmTMB(Cofeeding ~ offset(log(N_sessions)) + Year + family_size + Week + Condition + dyad_identity + family_size:Week + family_size:Condition + family_size:dyad_identity + Week:Condition + Week:dyad_identity + Condition:dyad_identity + (1 |Family_Year) + (1 |Dyad) + (1 |ID1) + (1 |ID2), data = data_FoodEx_dyad, family = nbinom1, ziformula = ~1)

check_zeroinflation(Cofeeding.nb1_z)

Screen Shot 2023-10-19 at 11 25 05

simulationOutput_Cofeeding.nb1_z <- simulateResiduals(fittedModel = Cofeeding.nb1_z, plot = F) plot(simulationOutput_Cofeeding.nb1_z) DHARMa residuals

testDispersion(simulationOutput_Cofeeding.nb1_z) testZeroInflation(simulationOutput_Cofeeding.nb1_z)

Zeroinflation and Overdispersion

PalmyreHBoucherie commented 8 months ago

Issue zeroinflation_DHARMa.csv

florianhartig commented 8 months ago

Hello Palmyre

OK, I have looked at your example from 2 angles: first of all, to see how the packages perform in a simple situation, I have simulated data from a Poisson distribution (i.e. no zero-inflation) and checked the behaviour of the performance package for Poisson and NegBinom GLMs. As you can see if you run this code (also linked on the rep, see link above), performance:: check_zeroinflation does not seem to work correctly for NegBinon - in my simulation, it always reported a zero-deficit, which is clearly wrong.

--> for the moment, I would conclude that there must be a bug in performance:: check_zeroinflation, and I would disregard all its outputs, at least for NegBinom models. I will report this to the performance crew.

dat <- createData(sampleSize = 1000)
fit <- glmmTMB(observedResponse ~ Environment1 + (1|group), data = dat, family = "poisson")

res <- simulateResiduals(fit, plot = T)
testZeroInflation(res)
check_zeroinflation(fit) # performance package calcualtes correct value

fit <- glmmTMB(observedResponse ~ Environment1 + (1|group), data = dat, family = "nbinom1")

res <- simulateResiduals(fit, plot = T)
testZeroInflation(res)
check_zeroinflation(fit)

For your specific example: in general, it seems to me that you have a very complex model - I had problems fitting it, and after updating glmmTMB and Matrix to the last version, the second model (with ZIP term) actually didn't converge. So my results are based on the previous glmmTMB version.

Here, my conclusions were the following:

My conclusion would thus be that DHARMa is / was right and there is no zero-inflation problem in your model.

dat = read.csv2("Code/DHARMaIssues/392.csv")

library(glmmTMB)
library(DHARMa)
library(performance)

# model without zip term
m0 = glmmTMB(Cofeeding ~ offset(log(N_sessions)) + Year + family_size + Week + Condition + dyad_identity + family_size:Week + family_size:Condition + family_size:dyad_identity + Week:Condition + Week:dyad_identity + Condition:dyad_identity + (1 |Family_Year) + (1 |Dyad) + (1 |ID1) + (1 |ID2), data = dat, family = nbinom1)

summary(m0)

res <- simulateResiduals(m0)
plot(res)
testZeroInflation(res) # no indication of zero-inflation
check_zeroinflation(m0) # suggests zero-inflation

# model with zip term
m1 = glmmTMB(Cofeeding ~ offset(log(N_sessions)) + Year + family_size + Week + Condition + dyad_identity + family_size:Week + family_size:Condition + family_size:dyad_identity + Week:Condition + Week:dyad_identity + Condition:dyad_identity + (1 |Family_Year) + (1 |Dyad) + (1 |ID1) + (1 |ID2), data = dat, family = nbinom1, ziformula = ~1)

res <- simulateResiduals(m1)
plot(res)
testZeroInflation(res)
check_zeroinflation(m1)

summary(m1) # note ZI estimate is large negative, i.e. there is no zero-inflation fitted, the model basically says that there is no zero-inflation present in this data.

# this is also confirmed by the AIC 
AIC(m0)
AIC(m1)

p.s.: In the development version of DHARMa, I have recently extended the help of testZeroInflation to clarify a few misunderstandings regarding the function and zero-inflation tests in general. This is the text of the new help:

Zero-inflation can occur for a number of reasons other than an underlying data generating process corresponding to a ZIP model. Vice versa, it is very well possible that no zero-inflation will be observed when fitting models to data derived from a ZIP process. The latter is due to the fact that excess zeros can often be explained by other model parameters, such as the theta parameter in the negative binomial.

For this reason, results of the zero-inflation test should be interpreted as a residual pattern that can have many reasons, not as a decision criterion for whether or not to fit a ZIP model. To decide whether to add a ZIP term, I would advise relying on appropriate model selection techniques such as AIC, BIC, WAIC, Bayes factor, or LRT. Note that these tests are often not reliable in GLMMs because it is difficult to determine the df spent by the different models. The simulateLRT function in DHARMa provides a nonparametric alternative to obtain p-values for LRT is nested models with unknown df.

PalmyreHBoucherie commented 8 months ago

Many thanks for your detailed feedback ! It really helped.

I will thus double-check all ZI estimate from my different zero-inflated models, and focus on DHARMa output to make a decision on the final model structures.

In case the performance crew would identify another explanation for this discrepancy than a bug in their performance::check_zeroinflation function, could you let me know?

Best, Palmyre

PalmyreHBoucherie commented 8 months ago

Just a follow-up question, you found the model very complex, and seem to have convergence issues depending on the version of glmmTMB used. The structure of the model was defined on the knowledge we have from the system, and thus indeed includes quite a large number of factors and interactions, but that we believe may be of relevance in explaining the observed patterns.

I personally had no convergence issues (at least with my version of glmmTMB), thus no red flag on model complexity until now.

But since you noted it, I wonder if we should raise concern about it? The other residual diagnostics of the model seemed ok to me, but I may have missed something important.

And about the zero-inflation tests, since you found a similar issue on simulated data, I guess we can safely say that the discrepancy does not result from the model itself?

Thanks, P.

florianhartig commented 8 months ago

Hello Palmyre,

convergence is not a model selection tool, i.e. you might very well have the right model structure even if it doesn't converge. Convergence is a practical / numerical problem that needs to be solved if you encounter it. If you don't encounter any convergence problems, I would not worry about it.

About the zero-inflation: yes, afaiks, you have no zero-inflation problem.

PalmyreHBoucherie commented 8 months ago

Dear Florian,

Thanks once more for your very valuable feedback ! Highly appreciated !

Best, P.