JuliaStats / MixedModels.jl

A Julia package for fitting (statistical) mixed-effects models
http://juliastats.org/MixedModels.jl/stable
MIT License
407 stars 48 forks source link

GLMMs #31

Closed kleinschmidt closed 8 years ago

kleinschmidt commented 9 years ago

I'm wondering what, if any, plans there are for support of GLMMs in this package.

I'd be happy to help, but after doing some reading about the general methods that are used to approximate the likelihood for GLMMs in lme4 I'm thinking that honestly it will take a lot to get myself up to speed. However, much of the data I deal with is binary or count data, so I'm highly motivated (as are some of my colleagues) and have a bit more free time over the summer to work on it.

The most obvious place, to me, to start is to look at the source code for lme4, but as far as I know that's verboten because it's GPL-licensed and so anything derived from it can't be included in this package which is MIT-licensed. Is that correct?

dmbates commented 9 years ago

Thanks for your interest in this package.

GLMMs are definitely on the "To Do" list but not at the top yet. I am currently working on reformulating the underlying penalized least squares (PLS) problem for linear mixed models (LMMs). To avoid making too big a mess of this package I created another package, ReTerms, for that work.

As you point out, the lme4 code is under the GPL and hence code derived from it would also need to fall under the GPL. Thus it is best to avoid reading the GPLd code. I can base my code on those methods because I wrote much of the lme4 code and developed the methods. In any case, the lme4 code is not easy to read and I wouldn't start with that.

One thing that would help is to get an idea of what kinds of models and data sets you use. There are always trade-offs in where to concentrate and whether to aim for full generality or to concentrate on a few special cases. I have the feeling that the ability to fit Bernoulli, Binomial and Poisson families with the canonical links would encompass most applications but I have no objective evidence to back this up.

We can carry on this discussion via email if you prefer.

kleinschmidt commented 9 years ago

Let's continue via email, but I'll note here for the record the kind of data I and my colleagues are typically dealing with.

All of the data that I (and, AFAIK, my colleagues) are analyzing with GLMMs is covered by that case (canonical link functions and Bernoulli, Binomial, & Poisson families). Here's an example of a dataset I'm working with at the moment, in the form of an R package: https://github.com/kleinschmidt/phonetic-sup-unsup

In the full dataset there are ~90k observations, and in the GLMMs I think I had something like four fixed effect predictors (plus all their interactions) with random slopes for two of them (plus one interaction) at one grouping level. It wasn't an issue in this experiment, but crossed random effects (with random slopes) are very common in our work, which is psycholinguistics and has subjects and items as grouping factors. So that's the main, non-trivial concern for us.

mguzmann commented 9 years ago

Would it be possible to follow the progress of this? I am very interested since I use GLMMs in a lot of my work...

2015-06-15 15:25 GMT+02:00 Dave Kleinschmidt notifications@github.com:

Let's continue via email, but I'll note here for the record the kind of data I and my colleagues are typically dealing with.

All of the data that I (and, AFAIK, my colleagues) are analyzing with GLMMs is covered by that case (canonical link functions and Bernoulli, Binomial, & Poisson families). Here's an example of a dataset I'm working with at the moment, in the form of an R package: https://github.com/kleinschmidt/phonetic-sup-unsup

In the full dataset there are ~90k observations, and in the GLMMs I think I had something like four fixed effect predictors (plus all their interactions) with random slopes for two of them (plus one interaction) at one grouping level. It wasn't an issue in this experiment, but crossed random effects are very common in our work, which is psycholinguistics and has subjects and items as grouping factors. So that's the main, non-trivial concern for us.

— Reply to this email directly or view it on GitHub https://github.com/dmbates/MixedModels.jl/issues/31#issuecomment-112069214 .

dmbates commented 9 years ago

Some thoughts on what needs to be done to incorporate GLMMs. These are technical considerations that may seem gibberish but do relate to the overall difficulty of the task.

Tasks for extending the MixedModels package to fit GLMMs:

(Update: Currently using the Laplace approximation)

(Update: Currently, fixed-effects are always part of the general optimization)

In the [lme4 package](https://github.com/lme4/lme4) for [R](http:www.R-project.org) we choose to do both.  In the first stage the fixed-effects parameters are part of the PIRLS optimization and only the covariance parameters are in the general optimizer.  In the second stage only the random effects are in the PIRLS optimization and the fixed effects are in the general optimizer.  I think it would be simpler to omit the first stage for the time being and keep only the random effects in the PIRLS optimization.
kleinschmidt commented 9 years ago

Thanks, having that list is very helpful. I think that requiring GLM.jl seems like the best way to go, rather than duplicating code between the two packages. I also think that, to start, it might be good to just always assume the canonical link. That will simplify development and in no way precludes us from including support for other link functions later on.

I've been reading the Pinheiro & Chao (2006) paper that describes the Laplacian and AGQ approximations to the likelihood and includes a discussion of finding the conditional modes via PIRLS. I take it this is what you're suggesting?

I don't have a strong opinion about whether to include the fixed effects themselves in the PIRLS optimization. I take it that, if they're not, then they will be optimized by the general optimizer used to update the variance-covariance parameters based on the approximate likelihood?

dmbates commented 8 years ago

GLMMs are now under active development.

I have created a branch called glmms which includes the GLM package. I'm currently working on the PIRLS algorithm.

dmbates commented 8 years ago

I have merged the glmms branch. Still working on the PIRLS algorithm implementation.