JuliaStats / MixedModels.jl

A Julia package for fitting (statistical) mixed-effects models
http://juliastats.org/MixedModels.jl/stable
MIT License
403 stars 48 forks source link

StatsBase Modelling API #514

Open palday opened 3 years ago

palday commented 3 years ago
dmbates commented 3 years ago

I can work on some of these if that would help.

palday commented 3 years ago

That would be great.

dmbates commented 3 years ago

I'm wondering how far down the rabbit hole to jump with this one. (It will probably be very far if past experience is a guide.) It seems we are inconsistent on some methods between LMMs and GLMMs. In particular for dof an LMM doesn't add in but the method for a GLMM does.

So my plan is:

Does that seem to be giving myself enough rope to mess things up?

palday commented 3 years ago

Yes. And I think it makes sense to have the StatsBase API just redirect to various properties. That way people can still abstract away via the functional interface when e.g. writing plotting recipes and the like.

I'm not sure how well some of these are defined / meaningful for the mixed-model case. The RSS is clearly defined, but the PWRSS is much more interesting. I almost feel like adding these functions is as much as docs issue as a writing a few short functions.

One other thing I would like to add to the docs is a more explicit discussion of why we have the actual deviance and log likelihood for GLMM, but only -2 log likelihood for LMM.

dmbates commented 3 years ago

Should the functions for which we have defined methods be re-exported?

palday commented 3 years ago

Yes.

dmbates commented 3 years ago

What do you think we should do about our inconsistency on whether the covariance parameters should be counted in dof_residual? We do count them consistently in dof so the question is whether we think the relationship

dof_residual(m) = nobs(m) - dof(m)

should be preserved (I don't think so b/c dof counts 1 for the per-observation noise variance) or we should just use

dof_residual(m) = nobs(m) - m.feterm.rank

The more ambitious definition would be to evaluate sum(leverage(m)) as the trace of the hat matrix. Of course, then we would need to explain and contend with all the "but the right answer from SAS/lme4 is ..." comments.

palday commented 3 years ago

I think the question should be: what do we want for the default? Because we can also support additional methods dof_residual(::LinearMixedModel; variant::Symbol) where variant in (:leverage, :ferank, :nparam).

At one point you made a pretty compelling argument for sum(leverage(m)). If I'm not mistaken, that would also allow us to use t instead of z in our CoefTable (of course, we would then need a direct dependency on Distributions.jl to compute the p-values). (If I am mistaken, please do explain which things I'm mixing up!). But then we run into computational complexity issues on leverage.

But regardless of the path we go here, we should probably add a page to the docs on the degrees of freedom. You have some good old posts in various mailing lists and fora that could serve as the basis and once again driving home the point that a lot of "obvious" things in classical regression just break down in the mixed-effects context. This will also hopefully give a place to point to on why Satterthwaite and Kenward-Roger approximations may not help as much as people think.

palday commented 3 years ago

Looking at the remaining items, here are my thoughts:

dmbates commented 3 years ago

Several of these may require us to decide if we want to generalize the evaluation of the terms in a model or generalize the concept.

For example, the formula for Cook's Distance involves the standardized residuals and the leverage values. We need to make sure that those are evaluated in ways that are consistent. In other words, if we use the penalized residual sum of squares in standardizing the residuals (I'm not even sure that would make sense) then we need to decide what the effective degrees of freedom for residuals should be, etc. So I think more will be involved here than trying to generalize terms in a formula.

For the cross model matrix there is a simple answer of p = m.feterm.rank; Symmetric(view(last(m.A), 1:p, 1:p)) if we ignore the blocking factors, which I think we probably should.

For confint I think we should use the Wald approximation and a docstring to promote shortestcovint. For the fixed-effects parameters this will likely be sufficient. For confidence intervals on the variance components or correlation parameters I think we need to insist on using a bootstrap sample or the yet-to-be-written profiling results.

For the information matrix I think we should pass. If attention is only on the fixed-effects parameters this is essentially like evaluating the cross model matrix. If you throw in variance components and correlation parameters then it is probably not a good idea to report it because it assumes that the log-likelihood is quadratic in the parameters of interest, which it isn't.