JuliaStats / MixedModels.jl

A Julia package for fitting (statistical) mixed-effects models
http://juliastats.org/MixedModels.jl/stable
MIT License
407 stars 48 forks source link

values of GLMM parameter estimates in show method #307

Closed dmbates closed 3 years ago

dmbates commented 4 years ago

In a message to R-SIG-Mixed-Models@R-Project.org with subject "nAGQ > 1 in lme4::glmer gives unexpected likelihood" Ben Goldstein provided simulated data

julia> goldstein = categorical!(DataFrame(y = Int32[83, 3, 8, 78, 901, 21, 4, 1, 1, 39, 82, 3, 2, 82, 874, 18, 5, 1, 3, 50, 87, 7, 3, 67, 914, 18, 0, 1, 1, 38, 86, 13, 5, 65, 913, 13, 2, 0, 0, 48, 90, 5, 5, 71, 886, 19, 3, 0, 2, 32, 96, 1, 1, 87, 860, 21, 3, 0, 1, 54, 83, 2, 4, 70, 874, 19, 5, 0, 4, 36, 100, 11, 3, 71, 950, 21, 6, 0, 1, 40, 89, 5, 5, 73, 859, 29, 3, 0, 2, 38, 78, 13, 6, 100, 852, 24, 5, 0, 1, 39], group = repeat(1:10, outer=10)), :group);

julia> by(goldstein, :group, mn = :y => mean)
10×2 DataFrame
│ Row │ group        │ mn      │
│     │ Categorical… │ Float64 │
├─────┼──────────────┼─────────┤
│ 1   │ 1            │ 87.4    │
│ 2   │ 2            │ 6.3     │
│ 3   │ 3            │ 4.2     │
│ 4   │ 4            │ 76.4    │
│ 5   │ 5            │ 888.3   │
│ 6   │ 6            │ 20.3    │
│ 7   │ 7            │ 3.6     │
│ 8   │ 8            │ 0.3     │
│ 9   │ 9            │ 1.6     │
│ 10  │ 10           │ 41.4    │

The peculiar thing is that fitting a simple GLMM model by Laplace approximation and with nAGQ=11 produces identical parameter estimates in the show method.

julia> m1 = fit(MixedModel, @formula(y ~ 1 + (1|group)), dd, Poisson())
Generalized Linear Mixed Model fit by maximum likelihood (nAGQ = 1)
  y ~ 1 + (1 | group)
  Distribution: Poisson{Float64}
  Link: LogLink()

  Deviance: 193.5587

Variance components:
         Column    Variance  Std.Dev. 
group (Intercept)  3.9577026 1.9893975

 Number of obs: 100; levels of grouping factors: 10

Fixed-effects parameters:
──────────────────────────────────────────────────
             Estimate  Std.Error  z value  P(>|z|)
──────────────────────────────────────────────────
(Intercept)   2.65175   0.632317     4.19    <1e-4
──────────────────────────────────────────────────
julia> m11 = fit(MixedModel, @formula(y ~ 1 + (1|group)), dd, Poisson(), nAGQ=11)
Generalized Linear Mixed Model fit by maximum likelihood (nAGQ = 11)
  y ~ 1 + (1 | group)
  Distribution: Poisson{Float64}
  Link: LogLink()

  Deviance: 193.5103

Variance components:
         Column    Variance  Std.Dev. 
group (Intercept)  3.9577026 1.9893975

 Number of obs: 100; levels of grouping factors: 10

Fixed-effects parameters:
──────────────────────────────────────────────────
             Estimate  Std.Error  z value  P(>|z|)
──────────────────────────────────────────────────
(Intercept)   2.65175   0.632317     4.19    <1e-4
──────────────────────────────────────────────────

First, the coeftable method is suspect

julia> fixef(m1)
1-element Array{Float64,1}:
 4.192196439077657

julia> coeftable(m1)
──────────────────────────────────────────────────
             Estimate  Std.Error  z value  P(>|z|)
──────────────────────────────────────────────────
(Intercept)   2.65175   0.632317     4.19    <1e-4
──────────────────────────────────────────────────

so I think the results being reported are wrong and second, even the parameter estimates stored in the object are exactly the same so maybe something about the nAGQ method is off

julia> m11.β
1-element Array{Float64,1}:
 4.192196439077657

julia> m1.β
1-element Array{Float64,1}:
 4.192196439077657

julia> m11.θ
1-element Array{Float64,1}:
 1.838245201739852

julia> m1.θ
1-element Array{Float64,1}:
 1.838245201739852

This is using

[[MixedModels]]
deps = ["BlockArrays", "BlockDiagonals", "Distributions", "Feather", "GLM", "LinearAlgebra", "NLopt", "NamedArrays", "Pkg", "PooledArrays", "ProgressMeter", "Random", "Showoff", "SparseArrays", "StaticArrays", "Statistics", "StatsBase", "StatsFuns", "StatsModels", "Tables"]
path = "/home/bates/.julia/dev/MixedModels"
uuid = "ff71e718-51f3-5ec2-a782-8ffcbfa3c316"
version = "2.2.0"
dmbates commented 4 years ago

The coeftable method is using the wrong value of the fixed-effects coefficients and the value of the standard deviations of the random effects includes a non-unit scale factor when it shouldn't. I will make those changes. The thing that I am still uncertain of is the approximate standard errors.

I guess it is time to get back to profiling to make sure that the estimates of the variability are sensible. Alternatively, it might be better to first work out a parametric bootstrap method as that is conceptually simpler.

palday commented 4 years ago

My vote is for profiling. At least for the FE, profiling is super fast and reasonably accurate.

dmbates commented 4 years ago

There's a bit of a chicken-and-egg situation here. For the fixed-effects parameters the initial step size when doing the profiling is usually based on the approximate standard error.

palday commented 4 years ago

@dmbates My work on JellyMe4 has suggested that the standard errors are now close, but often not as close as we would want, to lme4's standard errors.

In other words, just reminding us that we should revisit this when we get profiling done.

dmbates commented 3 years ago

@palday The issue of displaying the correct parameter estimates has been fixed and a test added in 5fb6f6502 so I think this can be considered closed. What do you think?

Profiling already exists as a separate issue.

palday commented 3 years ago

I agree.