ejolly / pymer4

All the convenience of lme4 in python
http://eshinjolly.com/pymer4
MIT License
192 stars 26 forks source link

Cannot get ANOVA p-values for binomial GLMMs #124

Open cherepaha opened 1 year ago

cherepaha commented 1 year ago

I am trying to get an ANOVA table for a GLMM model with binomial dependent variable. However, regardless of the data I'm actually using, the ANOVA table I'm getting following the tutorial only has columns for SS, MS, and F-stat, but misses estimated DF or p-values - which I ultimately need. It does give a warning about missing p-value computations, but it does so even if model.warnings is empty

Here is the minimal working example

import os
import pandas as pd
from pymer4.models import Lmer
from pymer4.utils import get_resource_path

df = pd.read_csv(os.path.join(get_resource_path(), "sample_data.csv"))
model = Lmer("DV_l ~ IV3 + (IV3|Group)", data=df, family="binomial")
model.fit(factors={"IV3": ["1.0", "0.5", "1.5"]})
model.anova()

Output: UserWarning: MODELING FIT WARNING! Check model.warnings!! P-value computation did not occur because lmerTest choked. Possible issue(s): ranefx have too many parameters or too little variance...

With sample_data.csv, inspecting model.warnings tells me that the model fit is singular. However, I originally encountered the issue with my own data for which the model fit is not singular. Unfortunately I cannot share the data, but when I fit my binomial model to it, model.warnings is empty, the random effect variances are not even close to 0, and random effect correlations are far from +/-1. So this made me think the issue is not with the model/data structure.

When googling this, I stumbled upon this thread. My interpretation of it is that getting p-values for binomial GLMMs is not provided out-of-the-box by lme4 and needs to be implemented separately (the thread seems to have a concise solution). If that's indeed the case, I would really appreciate if you can add support for this in pymer.

ejolly commented 1 year ago

Hmm yea this seems to be a limitation of the anova() function in R when called on glmer models, but not lmer models. If you check out the results of model.fit() you can see that it is possible to get p-values for each of your factor contrasts. These are calculated for all models in pymer4 using the lmerTest library. But p-values are not returned for the omnibus F-tests even when reproducing your example in R.

This won't be trivial to implement as the solution you linked to requires the afex package which isn't currently a dependency of pymer4. Adding it not only requires quite a few code changes due to the intricacies of parsing package specific inputs/outputs and python object, but is also likely to introduce installation/maintenance challenges (pymer4 would need to build with r-afex). Unfortunately, just to add this functionality, I'm not sure I'll realistically get around to adding this any time soon.

Apologies, but happy to take PRs!