Hi, and thanks for maintaining such a fantastic package!
I removed most of the issue template since I think this is a pretty straightforward bug and not related to my environment/system/specific data/etc., but if you'd like a more complete report with that info please let me know!
But then to compute the p-value for the test, pymer4.utils._lrt() re-computes the degrees of freedom based on the number of rows in the two models' .coefs attributes (pandas DataFrames): https://github.com/ejolly/pymer4/blob/4605b460e244621b56ef7786253de7f27e4fe77f/pymer4/utils.py#L635
which, for an Lmer model, includes only the fixed effects. So if the random effects differ between the two models, the p-value will be wrong.
I think this can pretty easily be fixed by using pymer4.utils._get_params(mod) instead of mod.coefs.shape[0] in pymer4.utils._lrt(). E.g., to ensure the function still works with Lm and Lm2 model classes, something like:
def _lrt(tup):
"""Likelihood ratio test between 2 models. Used by stats.lrt"""
from .models import Lmer # local import to avoid circular dependency
d = np.abs(2 * (tup[0].logLike - tup[1].logLike))
n_params_mod1 = _get_params(tup[0]) if isinstance(tup[0], Lmer) else tup[0].coefs.shape[0]
n_params_mod2 = _get_params(tup[1]) if isinstance(tup[1], Lmer) else tup[1].coefs.shape[0]
return chi2.sf(d, np.abs(n_params_mod1 - n_params_mod2))
Or since the test statistic and df are both already computed in the outer pymer4.stats.lrt() function, it might be simpler to just move the p-value calculation there and remove this helper function entirely.
I've confirmed that the p-value computed this way (i.e., using _get_params()) matches the p-value given by anova(mod1, mod2) in R, as do all other values in the DataFrame returned by pymer4.stats.lrt().
Hi, and thanks for maintaining such a fantastic package!
I removed most of the issue template since I think this is a pretty straightforward bug and not related to my environment/system/specific data/etc., but if you'd like a more complete report with that info please let me know!
When computing a likelihood-ratio test between 2 (or more)
Lmer
models,pymer4.stats.lrt()
computes the degrees of freedom shown in the output table based on the difference in the number of parameters between the models: https://github.com/ejolly/pymer4/blob/4605b460e244621b56ef7786253de7f27e4fe77f/pymer4/stats.py#L592But then to compute the p-value for the test,
pymer4.utils._lrt()
re-computes the degrees of freedom based on the number of rows in the two models'.coefs
attributes (pandas DataFrames): https://github.com/ejolly/pymer4/blob/4605b460e244621b56ef7786253de7f27e4fe77f/pymer4/utils.py#L635 which, for anLmer
model, includes only the fixed effects. So if the random effects differ between the two models, the p-value will be wrong.I think this can pretty easily be fixed by using
pymer4.utils._get_params(mod)
instead ofmod.coefs.shape[0]
inpymer4.utils._lrt()
. E.g., to ensure the function still works withLm
andLm2
model classes, something like:Or since the test statistic and df are both already computed in the outer
pymer4.stats.lrt()
function, it might be simpler to just move the p-value calculation there and remove this helper function entirely.I've confirmed that the p-value computed this way (i.e., using
_get_params()
) matches the p-value given byanova(mod1, mod2)
in R, as do all other values in the DataFrame returned bypymer4.stats.lrt()
.Thanks again!