JuliaStats / MixedModels.jl

A Julia package for fitting (statistical) mixed-effects models
http://juliastats.org/MixedModels.jl/stable
MIT License
403 stars 47 forks source link

NLopt error for Bernoulli models with random slopes when NOT using fast=true #220

Closed hikea closed 4 years ago

hikea commented 4 years ago

Something strange is happening with Bernoulli models. Normal seems to run okay.

Also, unrelated to this issue, but the package cannot handle arrays with missing values out of the box.

using DataFrames, MixedModels, GLM, RDatasets, RData, RCall, Missings, CSV

lexdec = rcopy(R"languageR::lexdec")

fm = fit(MixedModel, @formula(Correct ~ 1 + Class*NativeLanguage*Frequency + Trial + (1+Class + Frequency + Trial|Subject) + (1|Word)), lexdec, Bernoulli())
ERROR: ArgumentError: invalid NLopt arguments: zero step size
Stacktrace:

julia> fm = fit!(GeneralizedLinearMixedModel(@formula(Correct ~ 1 + Class*NativeLanguage*Frequency + (1+Class + Frequency + Trial|Subject) + (1|Word)), lexdec, Bernoulli()))
ERROR: ArgumentError: invalid NLopt arguments: zero step size
Stacktrace:

julia> fm = fit!(GeneralizedLinearMixedModel(@formula(Correct ~ 1 + Class*NativeLanguage*Frequency + (1+Class + Frequency + Trial|Subject) + (1|Word)), lexdec, Bernoulli()), fast=true)
Generalized Linear Mixed Model fit by maximum likelihood (nAGQ = 1)
  Correct ~ 1 + Class + NativeLanguage + Frequency + Class & NativeLanguage + Class & Frequency + NativeLanguage & Frequency + Class & NativeLanguage & Frequency + (1 + Class + Frequency + Trial | Subject) + (1 | Word)
  Distribution: Bernoulli{Float64}
  Link: LogitLink()

  Deviance: 475.6604

Variance components:
             Column        Variance      Std.Dev.    Corr.
 Subject (Intercept)   1.8648165158988 1.365582848
         Class: plant  0.6393674729270 0.799604573 -0.49
         Frequency     0.1061836740609 0.325858365 -0.94  0.73
         Trial         0.0000076274775 0.002761789  0.93 -0.21 -0.75
 Word    (Intercept)   0.7313290831756 0.855177808
 Number of obs: 1659; levels of grouping factors: 21, 79

Fixed-effects parameters:
──────────────────────────────────────────────────────────────────────────────────────────
                                                   Estimate  Std.Error    z value  P(>|z|)
──────────────────────────────────────────────────────────────────────────────────────────
(Intercept)                                       -1.11513    0.839064  -1.32902    0.1838
Class: plant                                      -0.721878   1.14746   -0.629111   0.5293
NativeLanguage: Other                              0.913713   1.03829    0.88002    0.3788
Frequency                                         -0.495801   0.174133  -2.84725    0.0044
Class: plant & NativeLanguage: Other               1.20139    1.33366    0.900816   0.3677
Class: plant & Frequency                          -0.140558   0.274048  -0.512895   0.6080
NativeLanguage: Other & Frequency                 -0.118276   0.221975  -0.532835   0.5941
Class: plant & NativeLanguage: Other & Frequency  -0.263183   0.336882  -0.781232   0.4347
──────────────────────────────────────────────────────────────────────────────────────────
palday commented 4 years ago

Duplicate of #201

palday commented 4 years ago

Thanks for reporting this -- it's good to have a test case for the boundary issue that isn't dependent on using gamma regression, which is generally less well tested than logistic regression.

palday commented 4 years ago

Also, unrelated to this issue, but the package cannot handle arrays with missing values out of the box.

The development version can handle missing values (by dropping them), but the more general problem is more or less inherited from JuliaStats/StatsModels.jl#17, JuliaStats/StatsModels.jl#145 and JuliaStats/StatsModels.jl#153. Please don't mix unrelated tidbits -- they distract or will be ignored. When we fix the boundary fit problem then this issue will be closed and that extra bit of info will disappear.

hikea commented 4 years ago

Thank you for the tip on missing - the development version did the trick.

About the model above. I think it is problematic model that should not converge. It should give warnings. This model doesn't converge in R with either glmer or glmmTMB. Both give warnings. On close examination, it looks like the random slope for Trial doesn't account for much variance (see below). rePCA from lme4 can also help to figure this out. So the model is overparametrized.

The model converges in Julia and in R with glmmTMB (but not either glmer) without the slope for Trial.

I remember dmbates objected the idea of giving the user warnings about convergence and potentially model misspecifications, but it seems that in this case it would have helped. Again, the problematic fit should be obvious based on the variance estimate for the random slope for Trial, but Julia ambiguously doesn't converge without fast=true. The user might be under the impression that the model is ok, but there is a problem with MixedModels.jl


> fm <- glmer(Correct ~ 1 + Class*NativeLanguage*Frequency + Trial + (1+Class + Frequency + Trial|Subject) + (1|Word), lexdec, family="binomial")
Warning messages:
1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model failed to converge with max|grad| = 1.16397 (tol = 0.001, component 1)
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model is nearly unidentifiable: large eigenvalue ratio
 - Rescale variables?

> summary(fm)
Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: binomial  ( logit )
Formula: Correct ~ 1 + Class * NativeLanguage * Frequency + Trial + (1 +  
    Class + Frequency + Trial | Subject) + (1 | Word)
   Data: lexdec

     AIC      BIC   logLik deviance df.resid 
   513.2    621.5   -236.6    473.2     1639 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.1224 -0.1865 -0.1193 -0.0731  7.7318 

Random effects:
 Groups  Name        Variance  Std.Dev. Corr             
 Word    (Intercept) 8.690e-01 0.932178                  
 Subject (Intercept) 1.345e+00 1.159747                  
         Classplant  7.804e-01 0.883404 -0.37            
         Frequency   9.266e-02 0.304398 -0.94  0.52      
         Trial       1.435e-05 0.003788  0.87 -0.13 -0.66
Number of obs: 1659, groups:  Word, 79; Subject, 21

Fixed effects:
                                          Estimate Std. Error z value Pr(>|z|)
(Intercept)                              -1.048216   1.423579  -0.736   0.4615
Classplant                               -0.916008   1.954159  -0.469   0.6392
NativeLanguageOther                       0.826676   1.553323   0.532   0.5946
Frequency                                -0.561306   0.295837  -1.897   0.0578
Trial                                    -0.001595   0.003758  -0.424   0.6713
Classplant:NativeLanguageOther            1.621867   2.082790   0.779   0.4362
Classplant:Frequency                     -0.164784   0.469251  -0.351   0.7255
NativeLanguageOther:Frequency            -0.090174   0.332856  -0.271   0.7865
Classplant:NativeLanguageOther:Frequency -0.350099   0.525350  -0.666   0.5051

(Intercept)                               
Classplant                                
NativeLanguageOther                       
Frequency                                .
Trial                                     
Classplant:NativeLanguageOther            
Classplant:Frequency                      
NativeLanguageOther:Frequency             
Classplant:NativeLanguageOther:Frequency  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects:
            (Intr) Clsspl NtvLnO Frqncy Trial  Cl:NLO Clss:F NtLO:F
Classplant  -0.593                                                 
NtvLnggOthr -0.544  0.344                                          
Frequency   -0.928  0.607  0.542                                   
Trial       -0.225 -0.010 -0.016 -0.007                            
Clsspln:NLO  0.340 -0.603 -0.617 -0.364  0.032                     
Clssplnt:Fr  0.487 -0.927 -0.292 -0.517 -0.002  0.576              
NtvLnggOt:F  0.525 -0.346 -0.955 -0.571  0.010  0.618  0.303       
Clssp:NLO:F -0.273  0.559  0.485  0.303 -0.031 -0.924 -0.614 -0.502
convergence code: 0
Model failed to converge with max|grad| = 1.16397 (tol = 0.001, component 1)
Model is nearly unidentifiable: large eigenvalue ratio
 - Rescale variables?

> summary(rePCA(fm))
$Word
Importance of components:
                         [,1]
Standard deviation     0.9322
Proportion of Variance 1.0000
Cumulative Proportion  1.0000

$Subject
Importance of components:
                         [,1]   [,2]    [,3]      [,4]
Standard deviation     1.2727 0.7694 0.07979 0.0003709
Proportion of Variance 0.7303 0.2669 0.00287 0.0000000
Cumulative Proportion  0.7303 0.9971 1.00000 1.0000000

> fm2 <- glmmTMB(Correct ~ 1 + Class*NativeLanguage*Frequency + Trial + (1+Class + Frequency + Trial|Subject) + (1|Word), lexdec, family="binomial")
Warning messages:
1: In fitTMB(TMBStruc) :
  Model convergence problem; non-positive-definite Hessian matrix. See vignette('troubleshooting')
2: In fitTMB(TMBStruc) :
  Model convergence problem; false convergence (8). See vignette('troubleshooting')

> summary(fm2)
 Family: binomial  ( logit )
Formula:          
Correct ~ 1 + Class * NativeLanguage * Frequency + Trial + (1 +  
    Class + Frequency + Trial | Subject) + (1 | Word)
Data: lexdec

     AIC      BIC   logLik deviance df.resid 
      NA       NA       NA       NA     1639 

Random effects:

Conditional model:
 Groups  Name        Variance  Std.Dev. Corr              
 Subject (Intercept) 2.415e+00 1.553987                   
         Classplant  8.742e-01 0.934975 -0.46             
         Frequency   1.403e-01 0.374617 -0.97  0.56       
         Trial       1.131e-05 0.003363  0.82 -0.14 -0.64 
 Word    (Intercept) 9.221e-01 0.960246                   
Number of obs: 1659, groups:  Subject, 21; Word, 79

Conditional model:
                                          Estimate Std. Error z value Pr(>|z|)
(Intercept)                              -1.049643   1.460669  -0.719   0.4724
Classplant                               -0.987369   2.013568  -0.490   0.6239
NativeLanguageOther                       0.992664   1.636702   0.606   0.5442
Frequency                                -0.574650   0.306207  -1.877   0.0606
Trial                                    -0.001363   0.003390  -0.402   0.6877
Classplant:NativeLanguageOther            1.424499   2.104225   0.677   0.4984
Classplant:Frequency                     -0.153430   0.480728  -0.319   0.7496
NativeLanguageOther:Frequency            -0.121959   0.349751  -0.349   0.7273
Classplant:NativeLanguageOther:Frequency -0.304610   0.528633  -0.576   0.5645

(Intercept)                               
Classplant                                
NativeLanguageOther                       
Frequency                                .
Trial                                     
Classplant:NativeLanguageOther            
Classplant:Frequency                      
NativeLanguageOther:Frequency             
Classplant:NativeLanguageOther:Frequency  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
palday commented 4 years ago

While parameterization issues can cause convergence issues, these are two different sets of problems. In particular, a random effect going to zero is something that lme4 and MixedModels.jl can deal with and that's a feature, not a bug. For such boundary fits, the gradient test isn't informative because the gradient being equal to zero is only a necessary condition for minima in the interior of a bounded parameter space. (There are also issues with the finite-differences approximation used by lme4, but this is a result of lme4 using a gradient-free method for fitting.) The eigenvalue test is more interesting; @dmbates may have a comment on whether or not we we should add it to a list of desirables.

glmmTMB uses a different fitting method (based on the gradient) and runs into a somewhat different convergence issue. A non positive-definite Hessian would also create problems for lme4, but because of the different convergence methods, the intermediate results are different and thus the two packages get stuck in different problem zones. That said, looking at the glmmTMB docs, problems with the Hessian can be the result of a boundary fit.

hikea commented 4 years ago

Thank you for clarifications!

dmbates commented 4 years ago

@hikea I have made changes to the optimization algorithm for GLMMs in the dropmulalphabeta branch of the package. Could you check your example after installing that branch as a development version? That is

using Pkg; Pkg.develop(PackageSpec(name="MixedModels", rev="dropmulalphabeta")
hikea commented 4 years ago

Can't get it working. Keep getting ERROR: git revision can not be given todevelop`` Tried installing the master version, but that didn't help either.

palday commented 4 years ago

@dmbates This example runs on the dropmulalphabeta branch.