JuliaStats / MixedModels.jl

A Julia package for fitting (statistical) mixed-effects models
http://juliastats.org/MixedModels.jl/stable
MIT License
407 stars 48 forks source link

Rescaling causes huge differences in fixed effect coefficients #638

Closed liamlundy closed 1 year ago

liamlundy commented 2 years ago

We have been using the latest version of MixedModels and have some models that seem to have hit the rescaling code added in https://github.com/JuliaStats/MixedModels.jl/pull/615 before they can be fit. The addition of the rescaling algorithm is great because it allows those models that were previously hitting the PosDef error during the fitting process to pass.

We do have some concerns about the results that the rescaled models are giving us however. We recently noticed that the coefficients are drastically different in models that are rescaled compared to very similar models that did not get the same treatment. I ran a few variations of a model where the rescaling algoirthm only kicked in some specific variation. The fixed effects remained fairly stable when the model is not rescaled, but change drastically when rescaling occurs. Changing ramdom coefficients makes sense but it seems unlikely that the fixed coefficients would be this different with only small modifications to the model form.

Script

This is the base script. The supporting data and OS info is attached at the bottom of the ticket. I attached the results from this as well as some similar models below with descriptions of exactly what was run.

using MixedModels, DataFrames, CSV, CategoricalArrays

model_form = @formula(
    volume ~ 1 +
    variable_1 +
    variable_2 +
    (1 | geo_level_3_product_level_3) +
    ((0 + variable_1) | geo_level_3_product_level_3) +
    (1 | geo_level_3_product_level_5) +
    ((0 + variable_2) | geo_level_3_product_level_5)
)

data = CSV.read("data.csv", DataFrame)

weight_vector = convert(Vector{Float64}, data[:, :observation_weight]);

interaction_columns = [
    "geo_level_3_product_level_3",
    "geo_level_3_product_level_5",
]
for column in interaction_columns
    data[!, column] = categorical(data[!, column])
end

model = MixedModels.LinearMixedModel(model_form, data, wts=weight_vector)

model.optsum.optimizer = :LN_BOBYQA
model_fit = MixedModels.fit!(model)

print(model_fit)

MixedModels v4.7 - Two Random Effects (With Rescaling)

This is the script run as is. As you can see, the rescaling algorithm kicks in here, producing some unexpected coefficients.

[ Info: Initial objective evaluation failed, rescaling initial guess and trying again.
┌ Warning: Failure of the initial evaluation is often indicative of a model specification
│    that is not well supported by the data and/or a poorly scaled model.
└ @ MixedModels ~/Development/MixedModels.jl/src/linearmixedmodel.jl:472
Minimizing 234   Time: 0:00:01 ( 5.50 ms/it)
  objective:  -1.755309756647321e6
Linear mixed model fit by maximum likelihood
 volume ~ 1 + variable_1 + variable_2 + (1 | geo_level_3_product_level_3) + (0 + variable_1 | geo_level_3_product_level_3) + (1 | geo_level_3_product_level_5) + (0 + variable_2 | geo_level_3_product_level_5)
     logLik     -2 logLik        AIC           AICc          BIC
   889840.4154 -1779680.8309 -1779664.8309 -1779664.8298 -1779586.3956

Variance components:
                               Column     Variance    Std.Dev.    Corr.
geo_level_3_product_level_5 (Intercept)  0.000000001 0.000023065
                            variable_2          0.000000100 0.000316914   .
geo_level_3_product_level_3 (Intercept)  0.000000000 0.000000000
                            variable_1   0.000000023 0.000152024   .
Residual                                 0.000000823 0.000906961
 Number of obs: 133841; levels of grouping factors: 467, 79

  Fixed-effects parameters:
──────────────────────────────────────────────────────────
                   Coef.  Std. Error           z  Pr(>|z|)
──────────────────────────────────────────────────────────
(Intercept)     -1.99737  2.57146e-6  -776745.20    <1e-99
variable_1   -1405.18     0.00151095  -929999.42    <1e-99
variable_2           1125.54     0.00121299   927908.84    <1e-99
──────────────────────────────────────────────────────────

MixedModels v4.7 - One Random Effect (No Rescaling)

This is the script run with geo_level_3_product_level_3 terms commented out.

Minimizing 57    Time: 0:00:00 ( 9.68 ms/it)
Linear mixed model fit by maximum likelihood
 volume ~ 1 + variable_1 + variable_2 + (1 | geo_level_3_product_level_5) + (0 + variable_2 | geo_level_3_product_level_5)
    logLik     -2 logLik       AIC         AICc          BIC
 -159979.3828  319958.7655  319970.7655  319970.7662  320029.5920

Variance components:
                               Column    Variance   Std.Dev.    Corr.
geo_level_3_product_level_5 (Intercept)  0.00007513 0.00866766
                            variable_2          2.21153704 1.48712375   .
Residual                                 5.36207911 2.31561636
 Number of obs: 133841; levels of grouping factors: 467

  Fixed-effects parameters:
───────────────────────────────────────────────────────
                    Coef.  Std. Error       z  Pr(>|z|)
───────────────────────────────────────────────────────
(Intercept)  -0.000426835  0.00137424   -0.31    0.7561
variable_1   -0.84226      0.0176115   -47.82    <1e-99
variable_2          -1.3658       0.0959489   -14.23    <1e-45
───────────────────────────────────────────────────────

MixedModels v4.7 - Two Random Effects - No Intercepts

This is the script run with the intercepts-per-interaction removed. The data is already mean-centered for each geogrpahy x product so the intercepts shouldn't be techincally neccessary.

Minimizing 54    Time: 0:00:00 (10.57 ms/it)
Linear mixed model fit by maximum likelihood
 volume ~ 1 + variable_1 + variable_2 + (0 + variable_1 | geo_level_3_product_level_3) + (0 + variable_2 | geo_level_3_product_level_5)
    logLik     -2 logLik       AIC         AICc          BIC
 -159052.2422  318104.4844  318116.4844  318116.4850  318175.3108

Variance components:
                              Column   VarianceStd.Dev.
geo_level_3_product_level_5 variable_2         2.23480 1.49493
geo_level_3_product_level_3 variable_1  1.83168 1.35340
Residual                                5.28184 2.29822
 Number of obs: 133841; levels of grouping factors: 467, 79

  Fixed-effects parameters:
────────────────────────────────────────────────────────
                    Coef.   Std. Error       z  Pr(>|z|)
────────────────────────────────────────────────────────
(Intercept)  -0.000678153  0.000936707   -0.72    0.4691
variable_1   -1.4881       0.171306      -8.69    <1e-17
variable_2          -1.42409      0.0964234    -14.77    <1e-48
────────────────────────────────────────────────────────

MixedModels v4.7 - Two Random Effects - No Weights

This is the script run with weights removed.

Minimizing 43    Time: 0:00:00 (12.20 ms/it)
Linear mixed model fit by maximum likelihood
 volume ~ 1 + variable_1 + variable_2 + (0 + variable_1 | geo_level_3_product_level_3) + (0 + variable_2 | geo_level_3_product_level_5)
    logLik   -2 logLik      AIC         AICc        BIC
 -80098.1994 160196.3988 160208.3988 160208.3994 160267.2252

Variance components:
                              Column   Variance Std.Dev.
geo_level_3_product_level_5 variable_2         0.937378 0.968183
geo_level_3_product_level_3 variable_1  0.904876 0.951250
Residual                                0.192165 0.438366
 Number of obs: 133841; levels of grouping factors: 467, 79

  Fixed-effects parameters:
────────────────────────────────────────────────────
                 Coef.  Std. Error       z  Pr(>|z|)
────────────────────────────────────────────────────
(Intercept)  -0.107289  0.00121008  -88.66    <1e-99
variable_1   -0.917196  0.118259     -7.76    <1e-14
variable_2          -0.718006  0.0549579   -13.06    <1e-38
────────────────────────────────────────────────────

MixedModels v1.1 - Two Random Effects

This is the original script run using julia v1.1 and MixedModels v1.1. Here the script passes without any rescaling and comes back with coefficients in a similar range to the models run without rescaling.

Linear mixed model fit by maximum likelihood
 Formula: volume ~ 1 + variable_1 + variable_2 + (1 | geo_level_3_product_level_3) + ((0 + variable_1) | geo_level_3_product_level_3) + (1 | geo_level_3_product_level_5) + ((0 + variable_2) | geo_level_3_product_level_5)
     logLik        -2 logLik          AIC             BIC
 -1.59019664×10⁵  3.18039329×10⁵  3.18055329×10⁵  3.18133764×10⁵

Variance components:
                                 Column      Variance    Std.Dev.    Corr.
 geo_level_3_product_level_5 (Intercept)  0.00007483254 0.00865058
                             variable_2          2.23583700185 1.49527155  0.00
 geo_level_3_product_level_3 (Intercept)  0.00000000000 0.00000000
                             variable_1   1.82904714767 1.35242270   NaN
 Residual                                 5.27726789831 2.29723048
 Number of obs: 133841; levels of grouping factors: 467, 79

  Fixed-effects parameters:
──────────────────────────────────────────────────────────
                 Estimate   Std.Error     z value  P(>|z|)
──────────────────────────────────────────────────────────
(Intercept)  -0.000406139  0.00136572   -0.297382   0.7662
variable_1   -1.48803      0.171188     -8.69236    <1e-17
variable_2          -1.42404      0.0964338   -14.767      <1e-48
──────────────────────────────────────────────────────────

Supporting Data / Info

Julia version info:

Julia Version 1.6.5
Commit 9058264a69 (2021-12-19 12:30 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin19.6.0)
  CPU: Apple M1 Pro
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, westmere)
View Project.toml ```toml [deps] CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b" CategoricalArrays = "324d7699-5711-5eae-9e2f-1d82baa6b597" DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0" MixedModels = "ff71e718-51f3-5ec2-a782-8ffcbfa3c316" ```
View Manifest.toml ```toml # This file is machine-generated - editing it directly is not advised [[ArgTools]] uuid = "0dad84c5-d112-42e6-8d28-ef12dabb789f" [[Arrow]] deps = ["ArrowTypes", "BitIntegers", "CodecLz4", "CodecZstd", "DataAPI", "Dates", "Mmap", "PooledArrays", "SentinelArrays", "Tables", "TimeZones", "UUIDs"] git-tree-sha1 = "4e7aa2021204bd9456ad3e87372237e84ee2c3c1" uuid = "69666777-d1a9-59fb-9406-91d4454c9d45" version = "2.3.0" [[ArrowTypes]] deps = ["UUIDs"] git-tree-sha1 = "a0633b6d6efabf3f76dacd6eb1b3ec6c42ab0552" uuid = "31f734f8-188a-4ce0-8406-c8a06bd891cd" version = "1.2.1" [[Artifacts]] uuid = "56f22d72-fd6d-98f1-02f0-08ddc0907c33" [[Base64]] uuid = "2a0f44e3-6c83-55bd-87e4-b1978d98bd5f" [[BenchmarkTools]] deps = ["JSON", "Logging", "Printf", "Profile", "Statistics", "UUIDs"] git-tree-sha1 = "4c10eee4af024676200bc7752e536f858c6b8f93" uuid = "6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf" version = "1.3.1" [[BitIntegers]] deps = ["Random"] git-tree-sha1 = "5a814467bda636f3dde5c4ef83c30dd0a19928e0" uuid = "c3b6d118-76ef-56ca-8cc7-ebb389d030a1" version = "0.2.6" [[Bzip2_jll]] deps = ["Artifacts", "JLLWrappers", "Libdl", "Pkg"] git-tree-sha1 = "19a35467a82e236ff51bc17a3a44b69ef35185a2" uuid = "6e34b625-4abd-537c-b88f-471c36dfa7a0" version = "1.0.8+0" [[CEnum]] git-tree-sha1 = "eb4cb44a499229b3b8426dcfb5dd85333951ff90" uuid = "fa961155-64e5-5f13-b03f-caf6b980ea82" version = "0.4.2" [[CSV]] deps = ["CodecZlib", "Dates", "FilePathsBase", "InlineStrings", "Mmap", "Parsers", "PooledArrays", "SentinelArrays", "Tables", "Unicode", "WeakRefStrings"] git-tree-sha1 = "873fb188a4b9d76549b81465b1f75c82aaf59238" uuid = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b" version = "0.10.4" [[Calculus]] deps = ["LinearAlgebra"] git-tree-sha1 = "f641eb0a4f00c343bbc32346e1217b86f3ce9dad" uuid = "49dc2e85-a5d0-5ad3-a950-438e2897f1b9" version = "0.5.1" [[CategoricalArrays]] deps = ["DataAPI", "Future", "Missings", "Printf", "Requires", "Statistics", "Unicode"] git-tree-sha1 = "5f5a975d996026a8dd877c35fe26a7b8179c02ba" uuid = "324d7699-5711-5eae-9e2f-1d82baa6b597" version = "0.10.6" [[ChainRulesCore]] deps = ["Compat", "LinearAlgebra", "SparseArrays"] git-tree-sha1 = "dc4405cee4b2fe9e1108caec2d760b7ea758eca2" uuid = "d360d2e6-b24c-11e9-a2a3-2a2ae2dbcce4" version = "1.15.5" [[ChangesOfVariables]] deps = ["ChainRulesCore", "LinearAlgebra", "Test"] git-tree-sha1 = "38f7a08f19d8810338d4f5085211c7dfa5d5bdd8" uuid = "9e997f8a-9a97-42d5-a9f1-ce6bfc15e2c0" version = "0.1.4" [[CodecBzip2]] deps = ["Bzip2_jll", "Libdl", "TranscodingStreams"] git-tree-sha1 = "2e62a725210ce3c3c2e1a3080190e7ca491f18d7" uuid = "523fee87-0ab8-5b00-afb7-3ecf72e48cfd" version = "0.7.2" [[CodecLz4]] deps = ["Lz4_jll", "TranscodingStreams"] git-tree-sha1 = "59fe0cb37784288d6b9f1baebddbf75457395d40" uuid = "5ba52731-8f18-5e0d-9241-30f10d1ec561" version = "0.4.0" [[CodecZlib]] deps = ["TranscodingStreams", "Zlib_jll"] git-tree-sha1 = "ded953804d019afa9a3f98981d99b33e3db7b6da" uuid = "944b1d66-785c-5afd-91f1-9de20f533193" version = "0.7.0" [[CodecZstd]] deps = ["CEnum", "TranscodingStreams", "Zstd_jll"] git-tree-sha1 = "849470b337d0fa8449c21061de922386f32949d9" uuid = "6b39b394-51ab-5f42-8807-6242bab2b4c2" version = "0.7.2" [[CommonSubexpressions]] deps = ["MacroTools", "Test"] git-tree-sha1 = "7b8a93dba8af7e3b42fecabf646260105ac373f7" uuid = "bbf7d656-a473-5ed7-a52c-81e309532950" version = "0.3.0" [[Compat]] deps = ["Dates", "LinearAlgebra", "UUIDs"] git-tree-sha1 = "5856d3031cdb1f3b2b6340dfdc66b6d9a149a374" uuid = "34da2185-b29b-5c13-b0c7-acf172513d20" version = "4.2.0" [[CompilerSupportLibraries_jll]] deps = ["Artifacts", "Libdl"] uuid = "e66e0078-7015-5450-92f7-15fbd957f2ae" [[Crayons]] git-tree-sha1 = "249fe38abf76d48563e2f4556bebd215aa317e15" uuid = "a8cc5b0e-0ffa-5ad4-8c14-923d3ee1735f" version = "4.1.1" [[DataAPI]] git-tree-sha1 = "fb5f5316dd3fd4c5e7c30a24d50643b73e37cd40" uuid = "9a962f9c-6df0-11e9-0e5d-c546b8b5ee8a" version = "1.10.0" [[DataFrames]] deps = ["Compat", "DataAPI", "Future", "InvertedIndices", "IteratorInterfaceExtensions", "LinearAlgebra", "Markdown", "Missings", "PooledArrays", "PrettyTables", "Printf", "REPL", "Reexport", "SortingAlgorithms", "Statistics", "TableTraits", "Tables", "Unicode"] git-tree-sha1 = "6bce52b2060598d8caaed807ec6d6da2a1de949e" uuid = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0" version = "1.3.5" [[DataStructures]] deps = ["Compat", "InteractiveUtils", "OrderedCollections"] git-tree-sha1 = "d1fff3a548102f48987a52a2e0d114fa97d730f0" uuid = "864edb3b-99cc-5e75-8d2d-829cb0a9cfe8" version = "0.18.13" [[DataValueInterfaces]] git-tree-sha1 = "bfc1187b79289637fa0ef6d4436ebdfe6905cbd6" uuid = "e2d170a0-9d28-54be-80f0-106bbe20a464" version = "1.0.0" [[Dates]] deps = ["Printf"] uuid = "ade2ca70-3891-5945-98fb-dc099432e06a" [[DensityInterface]] deps = ["InverseFunctions", "Test"] git-tree-sha1 = "80c3e8639e3353e5d2912fb3a1916b8455e2494b" uuid = "b429d917-457f-4dbc-8f4c-0cc954292b1d" version = "0.4.0" [[DiffResults]] deps = ["StaticArrays"] git-tree-sha1 = "c18e98cba888c6c25d1c3b048e4b3380ca956805" uuid = "163ba53b-c6d8-5494-b064-1a9d43ac40c5" version = "1.0.3" [[DiffRules]] deps = ["IrrationalConstants", "LogExpFunctions", "NaNMath", "Random", "SpecialFunctions"] git-tree-sha1 = "992a23afdb109d0d2f8802a30cf5ae4b1fe7ea68" uuid = "b552c78f-8df3-52c6-915a-8e097449b14b" version = "1.11.1" [[Distributed]] deps = ["Random", "Serialization", "Sockets"] uuid = "8ba89e20-285c-5b6f-9357-94700520ee1b" [[Distributions]] deps = ["ChainRulesCore", "DensityInterface", "FillArrays", "LinearAlgebra", "PDMats", "Printf", "QuadGK", "Random", "SparseArrays", "SpecialFunctions", "Statistics", "StatsBase", "StatsFuns", "Test"] git-tree-sha1 = "ee407ce31ab2f1bacadc3bd987e96de17e00aed3" uuid = "31c24e10-a181-5473-b8eb-7969acd0382f" version = "0.25.71" [[DocStringExtensions]] deps = ["LibGit2"] git-tree-sha1 = "5158c2b41018c5f7eb1470d558127ac274eca0c9" uuid = "ffbed154-4ef7-542d-bbb7-c09d3a79fcae" version = "0.9.1" [[Downloads]] deps = ["ArgTools", "LibCURL", "NetworkOptions"] uuid = "f43a241f-c20a-4ad4-852c-f6b1247861c6" [[DualNumbers]] deps = ["Calculus", "NaNMath", "SpecialFunctions"] git-tree-sha1 = "5837a837389fccf076445fce071c8ddaea35a566" uuid = "fa6b7ba4-c1ee-5f82-b5fc-ecf0adba8f74" version = "0.6.8" [[ExprTools]] git-tree-sha1 = "56559bbef6ca5ea0c0818fa5c90320398a6fbf8d" uuid = "e2ba6199-217a-4e67-a87a-7c52f15ade04" version = "0.1.8" [[FilePathsBase]] deps = ["Compat", "Dates", "Mmap", "Printf", "Test", "UUIDs"] git-tree-sha1 = "e27c4ebe80e8699540f2d6c805cc12203b614f12" uuid = "48062228-2e41-5def-b9a4-89aafe57970f" version = "0.9.20" [[FillArrays]] deps = ["LinearAlgebra", "Random", "SparseArrays", "Statistics"] git-tree-sha1 = "87519eb762f85534445f5cda35be12e32759ee14" uuid = "1a297f60-69ca-5386-bcde-b61e274b549b" version = "0.13.4" [[Formatting]] deps = ["Printf"] git-tree-sha1 = "8339d61043228fdd3eb658d86c926cb282ae72a8" uuid = "59287772-0a20-5a39-b81b-1366585eb4c0" version = "0.4.2" [[ForwardDiff]] deps = ["CommonSubexpressions", "DiffResults", "DiffRules", "LinearAlgebra", "LogExpFunctions", "NaNMath", "Preferences", "Printf", "Random", "SpecialFunctions", "StaticArrays"] git-tree-sha1 = "187198a4ed8ccd7b5d99c41b69c679269ea2b2d4" uuid = "f6369f11-7733-5829-9624-2563aa707210" version = "0.10.32" [[Future]] deps = ["Random"] uuid = "9fa8497b-333b-5362-9e8d-4d0656e87820" [[GLM]] deps = ["Distributions", "LinearAlgebra", "Printf", "Reexport", "SparseArrays", "SpecialFunctions", "Statistics", "StatsBase", "StatsFuns", "StatsModels"] git-tree-sha1 = "039118892476c2bf045a43b88fcb75ed566000ff" uuid = "38e38edf-8417-5370-95a0-9cbb8c7f171a" version = "1.8.0" [[HypergeometricFunctions]] deps = ["DualNumbers", "LinearAlgebra", "OpenLibm_jll", "SpecialFunctions", "Test"] git-tree-sha1 = "709d864e3ed6e3545230601f94e11ebc65994641" uuid = "34004b35-14d8-5ef3-9330-4cdb6864b03a" version = "0.3.11" [[InlineStrings]] deps = ["Parsers"] git-tree-sha1 = "d19f9edd8c34760dca2de2b503f969d8700ed288" uuid = "842dd82b-1e85-43dc-bf29-5d0ee9dffc48" version = "1.1.4" [[InteractiveUtils]] deps = ["Markdown"] uuid = "b77e0a4c-d291-57a0-90e8-8db25a27a240" [[InverseFunctions]] deps = ["Test"] git-tree-sha1 = "b3364212fb5d870f724876ffcd34dd8ec6d98918" uuid = "3587e190-3f89-42d0-90ee-14403ec27112" version = "0.1.7" [[InvertedIndices]] git-tree-sha1 = "bee5f1ef5bf65df56bdd2e40447590b272a5471f" uuid = "41ab1584-1d38-5bbf-9106-f11c6c58b48f" version = "1.1.0" [[IrrationalConstants]] git-tree-sha1 = "7fd44fd4ff43fc60815f8e764c0f352b83c49151" uuid = "92d709cd-6900-40b7-9082-c6be49f344b6" version = "0.1.1" [[IteratorInterfaceExtensions]] git-tree-sha1 = "a3f24677c21f5bbe9d2a714f95dcd58337fb2856" uuid = "82899510-4779-5014-852e-03e436cf321d" version = "1.0.0" [[JLLWrappers]] deps = ["Preferences"] git-tree-sha1 = "abc9885a7ca2052a736a600f7fa66209f96506e1" uuid = "692b3bcd-3c85-4b1f-b108-f13ce0eb3210" version = "1.4.1" [[JSON]] deps = ["Dates", "Mmap", "Parsers", "Unicode"] git-tree-sha1 = "3c837543ddb02250ef42f4738347454f95079d4e" uuid = "682c06a0-de6a-54ab-a142-c8b1cf79cde6" version = "0.21.3" [[JSON3]] deps = ["Dates", "Mmap", "Parsers", "StructTypes", "UUIDs"] git-tree-sha1 = "f1572de22c866dc92aea032bc89c2b137cbddd6a" uuid = "0f8b85d8-7281-11e9-16c2-39a750bddbf1" version = "1.10.0" [[LazyArtifacts]] deps = ["Artifacts", "Pkg"] uuid = "4af54fe1-eca0-43a8-85a7-787d91b784e3" [[LibCURL]] deps = ["LibCURL_jll", "MozillaCACerts_jll"] uuid = "b27032c2-a3e7-50c8-80cd-2d36dbcbfd21" [[LibCURL_jll]] deps = ["Artifacts", "LibSSH2_jll", "Libdl", "MbedTLS_jll", "Zlib_jll", "nghttp2_jll"] uuid = "deac9b47-8bc7-5906-a0fe-35ac56dc84c0" [[LibGit2]] deps = ["Base64", "NetworkOptions", "Printf", "SHA"] uuid = "76f85450-5226-5b5a-8eaa-529ad045b433" [[LibSSH2_jll]] deps = ["Artifacts", "Libdl", "MbedTLS_jll"] uuid = "29816b5a-b9ab-546f-933c-edad1886dfa8" [[Libdl]] uuid = "8f399da3-3557-5675-b5ff-fb832c97cbdb" [[LinearAlgebra]] deps = ["Libdl"] uuid = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e" [[LogExpFunctions]] deps = ["ChainRulesCore", "ChangesOfVariables", "DocStringExtensions", "InverseFunctions", "IrrationalConstants", "LinearAlgebra"] git-tree-sha1 = "94d9c52ca447e23eac0c0f074effbcd38830deb5" uuid = "2ab3a3ac-af41-5b50-aa03-7779005ae688" version = "0.3.18" [[Logging]] uuid = "56ddb016-857b-54e1-b83d-db4d58db5568" [[Lz4_jll]] deps = ["Artifacts", "JLLWrappers", "Libdl", "Pkg"] git-tree-sha1 = "5d494bc6e85c4c9b626ee0cab05daa4085486ab1" uuid = "5ced341a-0733-55b8-9ab6-a4889d929147" version = "1.9.3+0" [[MacroTools]] deps = ["Markdown", "Random"] git-tree-sha1 = "3d3e902b31198a27340d0bf00d6ac452866021cf" uuid = "1914dd2f-81c6-5fcd-8719-6d5c9610ff09" version = "0.5.9" [[Markdown]] deps = ["Base64"] uuid = "d6f4376e-aef5-505a-96c1-9c027394607a" [[MathOptInterface]] deps = ["BenchmarkTools", "CodecBzip2", "CodecZlib", "DataStructures", "ForwardDiff", "JSON", "LinearAlgebra", "MutableArithmetics", "NaNMath", "OrderedCollections", "Printf", "SparseArrays", "SpecialFunctions", "Test", "Unicode"] git-tree-sha1 = "81a98bed15dc1d49ed6afa2dc65272e402a1704b" uuid = "b8f27783-ece8-5eb3-8dc8-9495eed66fee" version = "1.8.1" [[MathProgBase]] deps = ["LinearAlgebra", "SparseArrays"] git-tree-sha1 = "9abbe463a1e9fc507f12a69e7f29346c2cdc472c" uuid = "fdba3010-5040-5b88-9595-932c9decdf73" version = "0.7.8" [[MbedTLS_jll]] deps = ["Artifacts", "Libdl"] uuid = "c8ffd9c3-330d-5841-b78e-0817d7145fa1" [[Missings]] deps = ["DataAPI"] git-tree-sha1 = "bf210ce90b6c9eed32d25dbcae1ebc565df2687f" uuid = "e1d29d7a-bbdc-5cf2-9ac0-f12de2c33e28" version = "1.0.2" [[MixedModels]] deps = ["Arrow", "DataAPI", "Distributions", "GLM", "JSON3", "LazyArtifacts", "LinearAlgebra", "Markdown", "NLopt", "PooledArrays", "ProgressMeter", "Random", "SparseArrays", "StaticArrays", "Statistics", "StatsBase", "StatsFuns", "StatsModels", "StructTypes", "Tables"] git-tree-sha1 = "eb09b8b591d0c2e551ef68c30810fd3d1bb7c946" uuid = "ff71e718-51f3-5ec2-a782-8ffcbfa3c316" version = "4.7.1" [[Mmap]] uuid = "a63ad114-7e13-5084-954f-fe012c677804" [[Mocking]] deps = ["ExprTools"] git-tree-sha1 = "748f6e1e4de814b101911e64cc12d83a6af66782" uuid = "78c3b35d-d492-501b-9361-3d52fe80e533" version = "0.7.2" [[MozillaCACerts_jll]] uuid = "14a3606d-f60d-562e-9121-12d972cd8159" [[MutableArithmetics]] deps = ["LinearAlgebra", "SparseArrays", "Test"] git-tree-sha1 = "4e675d6e9ec02061800d6cfb695812becbd03cdf" uuid = "d8a4904e-b15c-11e9-3269-09a3773c0cb0" version = "1.0.4" [[NLopt]] deps = ["MathOptInterface", "MathProgBase", "NLopt_jll"] git-tree-sha1 = "5a7e32c569200a8a03c3d55d286254b0321cd262" uuid = "76087f3c-5699-56af-9a33-bf431cd00edd" version = "0.6.5" [[NLopt_jll]] deps = ["Artifacts", "JLLWrappers", "Libdl", "Pkg"] git-tree-sha1 = "9b1f15a08f9d00cdb2761dcfa6f453f5d0d6f973" uuid = "079eb43e-fd8e-5478-9966-2cf3e3edb778" version = "2.7.1+0" [[NaNMath]] deps = ["OpenLibm_jll"] git-tree-sha1 = "a7c3d1da1189a1c2fe843a3bfa04d18d20eb3211" uuid = "77ba4419-2d1f-58cd-9bb1-8ffee604a2e3" version = "1.0.1" [[NetworkOptions]] uuid = "ca575930-c2e3-43a9-ace4-1e988b2c1908" [[OpenLibm_jll]] deps = ["Artifacts", "Libdl"] uuid = "05823500-19ac-5b8b-9628-191a04bc5112" [[OpenSpecFun_jll]] deps = ["Artifacts", "CompilerSupportLibraries_jll", "JLLWrappers", "Libdl", "Pkg"] git-tree-sha1 = "13652491f6856acfd2db29360e1bbcd4565d04f1" uuid = "efe28fd5-8261-553b-a9e1-b2916fc3738e" version = "0.5.5+0" [[OrderedCollections]] git-tree-sha1 = "85f8e6578bf1f9ee0d11e7bb1b1456435479d47c" uuid = "bac558e1-5e72-5ebc-8fee-abe8a469f55d" version = "1.4.1" [[PDMats]] deps = ["LinearAlgebra", "SparseArrays", "SuiteSparse"] git-tree-sha1 = "cf494dca75a69712a72b80bc48f59dcf3dea63ec" uuid = "90014a1f-27ba-587c-ab20-58faa44d9150" version = "0.11.16" [[Parsers]] deps = ["Dates"] git-tree-sha1 = "3d5bf43e3e8b412656404ed9466f1dcbf7c50269" uuid = "69de0a69-1ddd-5017-9359-2bf0b02dc9f0" version = "2.4.0" [[Pkg]] deps = ["Artifacts", "Dates", "Downloads", "LibGit2", "Libdl", "Logging", "Markdown", "Printf", "REPL", "Random", "SHA", "Serialization", "TOML", "Tar", "UUIDs", "p7zip_jll"] uuid = "44cfe95a-1eb2-52ea-b672-e2afdf69b78f" [[PooledArrays]] deps = ["DataAPI", "Future"] git-tree-sha1 = "a6062fe4063cdafe78f4a0a81cfffb89721b30e7" uuid = "2dfb63ee-cc39-5dd5-95bd-886bf059d720" version = "1.4.2" [[Preferences]] deps = ["TOML"] git-tree-sha1 = "47e5f437cc0e7ef2ce8406ce1e7e24d44915f88d" uuid = "21216c6a-2e73-6563-6e65-726566657250" version = "1.3.0" [[PrettyTables]] deps = ["Crayons", "Formatting", "Markdown", "Reexport", "Tables"] git-tree-sha1 = "dfb54c4e414caa595a1f2ed759b160f5a3ddcba5" uuid = "08abe8d2-0d0c-5749-adfa-8a2ac140af0d" version = "1.3.1" [[Printf]] deps = ["Unicode"] uuid = "de0858da-6303-5e67-8744-51eddeeeb8d7" [[Profile]] deps = ["Printf"] uuid = "9abbd945-dff8-562f-b5e8-e1ebf5ef1b79" [[ProgressMeter]] deps = ["Distributed", "Printf"] git-tree-sha1 = "d7a7aef8f8f2d537104f170139553b14dfe39fe9" uuid = "92933f4c-e287-5a05-a399-4b506db050ca" version = "1.7.2" [[QuadGK]] deps = ["DataStructures", "LinearAlgebra"] git-tree-sha1 = "3c009334f45dfd546a16a57960a821a1a023d241" uuid = "1fd47b50-473d-5c70-9696-f719f8f3bcdc" version = "2.5.0" [[REPL]] deps = ["InteractiveUtils", "Markdown", "Sockets", "Unicode"] uuid = "3fa0cd96-eef1-5676-8a61-b3b8758bbffb" [[Random]] deps = ["Serialization"] uuid = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c" [[RecipesBase]] git-tree-sha1 = "6bf3f380ff52ce0832ddd3a2a7b9538ed1bcca7d" uuid = "3cdcf5f2-1ef4-517c-9805-6587b60abb01" version = "1.2.1" [[Reexport]] git-tree-sha1 = "45e428421666073eab6f2da5c9d310d99bb12f9b" uuid = "189a3867-3050-52da-a836-e630ba90ab69" version = "1.2.2" [[Requires]] deps = ["UUIDs"] git-tree-sha1 = "838a3a4188e2ded87a4f9f184b4b0d78a1e91cb7" uuid = "ae029012-a4dd-5104-9daa-d747884805df" version = "1.3.0" [[Rmath]] deps = ["Random", "Rmath_jll"] git-tree-sha1 = "bf3188feca147ce108c76ad82c2792c57abe7b1f" uuid = "79098fc4-a85e-5d69-aa6a-4863f24498fa" version = "0.7.0" [[Rmath_jll]] deps = ["Artifacts", "JLLWrappers", "Libdl", "Pkg"] git-tree-sha1 = "68db32dff12bb6127bac73c209881191bf0efbb7" uuid = "f50d1b31-88e8-58de-be2c-1cc44531875f" version = "0.3.0+0" [[SHA]] uuid = "ea8e919c-243c-51af-8825-aaa63cd721ce" [[Scratch]] deps = ["Dates"] git-tree-sha1 = "f94f779c94e58bf9ea243e77a37e16d9de9126bd" uuid = "6c6a2e73-6563-6170-7368-637461726353" version = "1.1.1" [[SentinelArrays]] deps = ["Dates", "Random"] git-tree-sha1 = "db8481cf5d6278a121184809e9eb1628943c7704" uuid = "91c51154-3ec4-41a3-a24f-3f23e20d615c" version = "1.3.13" [[Serialization]] uuid = "9e88b42a-f829-5b0c-bbe9-9e923198166b" [[ShiftedArrays]] git-tree-sha1 = "22395afdcf37d6709a5a0766cc4a5ca52cb85ea0" uuid = "1277b4bf-5013-50f5-be3d-901d8477a67a" version = "1.0.0" [[Sockets]] uuid = "6462fe0b-24de-5631-8697-dd941f90decc" [[SortingAlgorithms]] deps = ["DataStructures"] git-tree-sha1 = "b3363d7460f7d098ca0912c69b082f75625d7508" uuid = "a2af1166-a08f-5f64-846c-94a0d3cef48c" version = "1.0.1" [[SparseArrays]] deps = ["LinearAlgebra", "Random"] uuid = "2f01184e-e22b-5df5-ae63-d93ebab69eaf" [[SpecialFunctions]] deps = ["ChainRulesCore", "IrrationalConstants", "LogExpFunctions", "OpenLibm_jll", "OpenSpecFun_jll"] git-tree-sha1 = "d75bda01f8c31ebb72df80a46c88b25d1c79c56d" uuid = "276daf66-3868-5448-9aa4-cd146d93841b" version = "2.1.7" [[StaticArrays]] deps = ["LinearAlgebra", "Random", "StaticArraysCore", "Statistics"] git-tree-sha1 = "efa8acd030667776248eabb054b1836ac81d92f0" uuid = "90137ffa-7385-5640-81b9-e52037218182" version = "1.5.7" [[StaticArraysCore]] git-tree-sha1 = "ec2bd695e905a3c755b33026954b119ea17f2d22" uuid = "1e83bf80-4336-4d27-bf5d-d5a4f845583c" version = "1.3.0" [[Statistics]] deps = ["LinearAlgebra", "SparseArrays"] uuid = "10745b16-79ce-11e8-11f9-7d13ad32a3b2" [[StatsAPI]] deps = ["LinearAlgebra"] git-tree-sha1 = "f9af7f195fb13589dd2e2d57fdb401717d2eb1f6" uuid = "82ae8749-77ed-4fe6-ae5f-f523153014b0" version = "1.5.0" [[StatsBase]] deps = ["DataAPI", "DataStructures", "LinearAlgebra", "LogExpFunctions", "Missings", "Printf", "Random", "SortingAlgorithms", "SparseArrays", "Statistics", "StatsAPI"] git-tree-sha1 = "d1bf48bfcc554a3761a133fe3a9bb01488e06916" uuid = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91" version = "0.33.21" [[StatsFuns]] deps = ["ChainRulesCore", "HypergeometricFunctions", "InverseFunctions", "IrrationalConstants", "LogExpFunctions", "Reexport", "Rmath", "SpecialFunctions"] git-tree-sha1 = "5783b877201a82fc0014cbf381e7e6eb130473a4" uuid = "4c63d2b9-4356-54db-8cca-17b64c39e42c" version = "1.0.1" [[StatsModels]] deps = ["DataAPI", "DataStructures", "LinearAlgebra", "Printf", "REPL", "ShiftedArrays", "SparseArrays", "StatsBase", "StatsFuns", "Tables"] git-tree-sha1 = "f8ba54b202c77622a713e25e7616d618308b34d3" uuid = "3eaba693-59b7-5ba5-a881-562e759f1c8d" version = "0.6.31" [[StructTypes]] deps = ["Dates", "UUIDs"] git-tree-sha1 = "ca4bccb03acf9faaf4137a9abc1881ed1841aa70" uuid = "856f2bd8-1eba-4b0a-8007-ebc267875bd4" version = "1.10.0" [[SuiteSparse]] deps = ["Libdl", "LinearAlgebra", "Serialization", "SparseArrays"] uuid = "4607b0f0-06f3-5cda-b6b1-a6196a1729e9" [[TOML]] deps = ["Dates"] uuid = "fa267f1f-6049-4f14-aa54-33bafae1ed76" [[TableTraits]] deps = ["IteratorInterfaceExtensions"] git-tree-sha1 = "c06b2f539df1c6efa794486abfb6ed2022561a39" uuid = "3783bdb8-4a98-5b6b-af9a-565f29a5fe9c" version = "1.0.1" [[Tables]] deps = ["DataAPI", "DataValueInterfaces", "IteratorInterfaceExtensions", "LinearAlgebra", "OrderedCollections", "TableTraits", "Test"] git-tree-sha1 = "4d5536136ca85fe9931d6e8920c138bb9fcc6532" uuid = "bd369af6-aec1-5ad0-b16a-f7cc5008161c" version = "1.8.0" [[Tar]] deps = ["ArgTools", "SHA"] uuid = "a4e569a6-e804-4fa4-b0f3-eef7a1d5b13e" [[Test]] deps = ["InteractiveUtils", "Logging", "Random", "Serialization"] uuid = "8dfed614-e22c-5e08-85e1-65c5234f0b40" [[TimeZones]] deps = ["Dates", "Downloads", "InlineStrings", "LazyArtifacts", "Mocking", "Printf", "RecipesBase", "Scratch", "Unicode"] git-tree-sha1 = "d634a3641062c040fc8a7e2a3ea17661cc159688" uuid = "f269a46b-ccf7-5d73-abea-4c690281aa53" version = "1.9.0" [[TranscodingStreams]] deps = ["Random", "Test"] git-tree-sha1 = "8a75929dcd3c38611db2f8d08546decb514fcadf" uuid = "3bb67fe8-82b1-5028-8e26-92a6c54297fa" version = "0.9.9" [[UUIDs]] deps = ["Random", "SHA"] uuid = "cf7118a7-6976-5b1a-9a39-7adc72f591a4" [[Unicode]] uuid = "4ec0a83e-493e-50e2-b9ac-8f72acf5a8f5" [[WeakRefStrings]] deps = ["DataAPI", "InlineStrings", "Parsers"] git-tree-sha1 = "b1be2855ed9ed8eac54e5caff2afcdb442d52c23" uuid = "ea10d353-3f73-51f8-a26c-33c1cb351aa5" version = "1.4.2" [[Zlib_jll]] deps = ["Libdl"] uuid = "83775a58-1f1d-513f-b197-d71354ab007a" [[Zstd_jll]] deps = ["Artifacts", "JLLWrappers", "Libdl", "Pkg"] git-tree-sha1 = "e45044cd873ded54b6a5bac0eb5c971392cf1927" uuid = "3161d3a3-bdf6-5164-811a-617609db77b4" version = "1.5.2+0" [[nghttp2_jll]] deps = ["Artifacts", "Libdl"] uuid = "8e850ede-7688-5339-a07c-302acd2aaf8d" [[p7zip_jll]] deps = ["Artifacts", "Libdl"] uuid = "3f19e933-33d8-53b3-aaab-bd5110c3b7a0" ```

data.csv

liamlundy commented 2 years ago

Adding @yuexiao-marketingattribution to this thread

dmbates commented 2 years ago

Thanks for including the example and the data. We'll take a look.

dmbates commented 2 years ago

I wrote up some comments on the different fits as a Quarto document. It can be transformed into a Jupyter notebook (.ipynb) or as a PDF document or a HTML document. Which format would you prefer, @liamlundy and @yuexiao-marketingattribution? I enclose the PDF but that doesn't work particularly well with some of the tables. issue638.pdf

yuexiao-marketingattribution commented 2 years ago

Hi, Dr. Bates, Thanks a lot for looking into this. Am I right to say that the wacky fixed coefficient seems to be related to the initial parameters? And scaling could make it very small in some cases?

Would we be able to get around this in the model estimation algorithm?

Thanks, Yue

On Fri, Sep 23, 2022 at 10:41 AM Douglas Bates @.***> wrote:

I wrote up some comments on the different fits as a Quarto https://quarto.org/ document. It can be transformed into a Jupyter notebook (.ipynb) or as a PDF document or a HTML document. Which format would you prefer, @liamlundy https://github.com/liamlundy and @yuexiao-marketingattribution https://github.com/yuexiao-marketingattribution? I enclose the PDF but that doesn't work particularly well with some of the tables. issue638.pdf https://github.com/JuliaStats/MixedModels.jl/files/9635099/issue638.pdf

— Reply to this email directly, view it on GitHub https://github.com/JuliaStats/MixedModels.jl/issues/638#issuecomment-1256373452, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHERARHLWCGQRHIY4ERTHIDV7XFRJANCNFSM6AAAAAAQQHF2GQ . You are receiving this because you were mentioned.Message ID: @.***>

dmbates commented 2 years ago

The peculiar fixed-effects coefficients, and the estimates of the random effects standard deviations and the extreme negative value of the objective, are all related to the fact that the response is mean centered and that the weights are over such a wide range (6 or 7 orders of magnitude). It may be possible to use the estimates from the model without random effects for the intercept to create starting estimates for the model you wish to fit. But that still doesn't get around the fact that mean-centering the response leaves you chasing rounding error in the random effects for the intercept.

Is there a reason not to just stay with the model that has random effects for the covariates only?

dmbates commented 2 years ago

I tried manually setting the initial values for the parameters in the model you are trying to fit and still got the message about rescaling and convergence to nonsensical estimates.

julia> m01a = let
          f = @formula(volume ~ 1 + v2 + v1 + zerocorr(1+v2|pl5) + zerocorr(1+v1|pl3))
          m = LinearMixedModel(f, dat; contrasts, wts=dat.wts)
          copyto!(m.optsum.initial, [0.0001, 0.65, 0.0001, 0.59])
          fit!(m; thin=1)
       end
[ Info: Initial objective evaluation failed, rescaling initial guess and trying again.
┌ Warning: Failure of the initial evaluation is often indicative of a model specification
│    that is not well supported by the data and/or a poorly scaled model.
└ @ MixedModels ~/.julia/packages/MixedModels/1FpDN/src/linearmixedmodel.jl:472
Minimizing 114   Time: 0:00:00 ( 0.95 ms/it)
  objective:  -347053.39586158074
Linear mixed model fit by maximum likelihood
 volume ~ 1 + v2 + v1 + MixedModels.ZeroCorr((1 + v2 | pl5)) + MixedModels.ZeroCorr((1 + v1 | pl3))
    logLik     -2 logLik       AIC         AICc          BIC     
  214818.8698 -429637.7397 -429621.7397 -429621.7386 -429543.3044

Variance components:
            Column    Variance   Std.Dev.    Corr.
pl5      (Intercept)  0.00000000 0.00000001
         v2           0.00174499 0.04177307   .  
pl3      (Intercept)  0.00000000 0.00001041
         v1           0.00071134 0.02667089   .  
Residual              0.01985613 0.14091177
 Number of obs: 133841; levels of grouping factors: 467, 79

  Fixed-effects parameters:
──────────────────────────────────────────────────────────
                     Coef.  Std. Error         z  Pr(>|z|)
──────────────────────────────────────────────────────────
(Intercept)      0.0139727  5.75968e-5    242.59    <1e-99
v2            1142.51       0.191435     5968.11    <1e-99
v1           -1900.27       0.317759    -5980.22    <1e-99
──────────────────────────────────────────────────────────

julia> m01a.optsum.fitlog
113-element Vector{Tuple{Vector{Float64}, Float64}}:
 ([5.755952196370287e-9, 3.741368927640687e-5, 5.755952196370287e-9, 3.396011795858469e-5], 326704.2440061639)
 ([1.0072916343648002e-8, 3.741368927640687e-5, 5.755952196370287e-9, 3.396011795858469e-5], 326704.24400616286)
 ([5.755952196370287e-9, 6.547395623371202e-5, 5.755952196370287e-9, 3.396011795858469e-5], 326704.23154658405)
 ([5.755952196370287e-9, 3.741368927640687e-5, 1.0072916343648002e-8, 3.396011795858469e-5], 326704.24400616414)
 ([5.755952196370287e-9, 3.741368927640687e-5, 5.755952196370287e-9, 5.9430206427523206e-5], 326704.2346277882)
 ([1.4389880490925721e-9, 3.741368927640687e-5, 5.755952196370287e-9, 3.396011795858469e-5], 326704.2440061644)
 ([5.755952196370287e-9, 9.35342231910172e-6, 5.755952196370287e-9, 3.396011795858469e-5], 326704.2496696343)
 ([5.755952196370287e-9, 3.741368927640687e-5, 1.4389880490925721e-9, 3.396011795858469e-5], 326704.2440061638)
 ([5.755952196370287e-9, 3.741368927640687e-5, 5.755952196370287e-9, 8.490029489646172e-6], 326704.248269125)
 ⋮
 ([8.259573820488997e-8, 0.29644805996849344, 7.387684360910232e-5, 0.18927318856618783], -136810.10648119918)
 ([8.248653458552367e-8, 0.29644688624629945, 7.387678952481503e-5, 0.18927412081947173], -271214.24989013287)
 ([8.248525785966926e-8, 0.29644836729405705, 7.387687418856134e-5, 0.1892736573373612], -392612.4826040373)
 ([8.240938253121144e-8, 0.2964487496311698, 7.387684624338898e-5, 0.18927366584797933], 326704.2440061639)
 ([8.243358028139956e-8, 0.2964482077859099, 7.387689565407254e-5, 0.18927319028752818], -149085.5704245197)
 ([8.252860617632192e-8, 0.2964484292586508, 7.387696379877048e-5, 0.18927365594744391], -429637.73967741965)
 ([8.258728390571775e-8, 0.2964487421416846, 7.387696715184453e-5, 0.18927403961173647], 326704.2440061639)
 ([8.248869647385056e-8, 0.2964483572018691, 7.387703325836202e-5, 0.18927330891463165], -192835.30979473024)

@palday may have some ideas of how to handle this.

yuexiao-marketingattribution commented 2 years ago

Yes we are planning on removing the random effects for the intercepts. But we have to have the weights in the model as they do matter. The reason we are including the intercepts in this test is that we used to have that in the model spec, using the older version of the package. And we want to get a sense of how big the difference would be to move to the new version. This example we gave you has the minimal variables that would show the problem. Liam will strip the random intercepts and see if we can come up with a simple case that gives us similar errors.

A potentially related info. The scaling part that seems to be causing the problem was recently added by @palday https://github.com/palday to help us pass some models that otherwise would fail.

Thanks, Yue

On Fri, Sep 23, 2022 at 12:56 PM Douglas Bates @.***> wrote:

The peculiar fixed-effects coefficients, and the estimates of the random effects standard deviations and the extreme negative value of the objective, are all related to the fact that the response is mean centered and that the weights are over such a wide range (6 or 7 orders of magnitude). It may be possible to use the estimates from the model without random effects for the intercept to create starting estimates for the model you wish to fit. But that still doesn't get around the fact that mean-centering the response leaves you chasing rounding error in the random effects for the intercept.

Is there a reason not to just stay with the model that has random effects for the covariates only?

— Reply to this email directly, view it on GitHub https://github.com/JuliaStats/MixedModels.jl/issues/638#issuecomment-1256511806, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHERARBBMRLTVU5XZI3JCJ3V7XVMLANCNFSM6AAAAAAQQHF2GQ . You are receiving this because you were mentioned.Message ID: @.***>

palday commented 2 years ago

I have a few ideas about how to nudge the optimizer here, but probably won't have the time to test them / write them up for a few days. As @dmbates mentioned, the huge span of the weights is going to create problems no matter what, but we might able to provide better tools for users to diagnose such problems.

yuexiao-marketingattribution commented 2 years ago

Thanks for looking into this. We understand the data is not ideal. But this is the nature of some of our data unfortunately. We appreciate your attention to this matter.

On Sun, Sep 25, 2022 at 12:49 PM Phillip Alday @.***> wrote:

I have a few ideas about how to nudge the optimizer here, but probably won't have the time to test them / write them up for a few days. As @dmbates https://github.com/dmbates mentioned, the huge span of the weights is going to create problems no matter what, but we might able to provide better tools for users to diagnose such problems.

— Reply to this email directly, view it on GitHub https://github.com/JuliaStats/MixedModels.jl/issues/638#issuecomment-1257243720, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHERARB2VOG6R7HC5UY664DWACGA3ANCNFSM6AAAAAAQQHF2GQ . You are receiving this because you were mentioned.Message ID: @.***>

liamlundy commented 2 years ago

Description

Thank you both for looking into this and for the helpful suggestions regarding best code practice in MixedModels. I reworked the test script a little bit to hopefully better encapsulate our issues and provide some context for what we are hoping to accomplish.

As Yue mentioned, we are aware that our models do not always have the best data support, but still need to fit them as best we can while bearing in mind the limitations of the data. Right now, we are working on upgrading our very old versions of julia and MixedModels and noticed that we are getting different model results in some cases. We needed to either get the results to match or at least understand / explain the differences. This was the initial impetus to test some of these cases. We are primarily testing julia (v1.6.0) / MixedModels (v4.7.1) against julia (v1.1.0) / MixedModels (v1.1.6). We are were previously using intercepts per pl3 and pl5 in our old code (using MixedModels 1.1), but have subsequently removed those terms. Because we wanted to do a direct comparison, I temporarily added those back while we compared results but they will be removed once we are fully on the new library versions. Now there were a few interesting things I noticed while running this new script. Here is the latest version of the test script below. I also attached the updated CSV data.

Script / Data

data.csv

using MixedModels, DataFrames, CSV, CategoricalArrays

model_form = @formula(
    y ~
    v1 +
    v2 +
    v3 +
    v4 +
    v5 +
    (1 | pl3) +
    ((0 + v1) | pl3) +
    (1 | pl5) +
    ((0 + v2) | pl5) +
    ((0 + v3) | pl5) +
    ((0 + v4) | pl5) +
    ((0 + v5) | pl5)
)

data = CSV.read("data.csv", DataFrame)

weight_vector = convert(Vector{Float64}, data[:, :w]);

contrasts = Dict{Symbol,Any}(:pl3 => Grouping(), :pl5 => Grouping());
m03 = let
    fit(MixedModel, model_form, data; contrasts, wts=weight_vector, thin=1)
end

print(m03)

Comparison with old julia / MM

I ran the same data / formula against MM 1.1 / julia 1.1. I wanted to compare this model against the current set up so I tried running without interaction intercepts but that ended up erroring out:

ERROR: LoadError: MethodError: no method matching rmul!(::SparseArrays.SparseMatrixCSC{Float64,Int32}, ::RepeatedBlockDiagonal{Float64,LinearAlgebra.LowerTriangular{Float64,Array{Float64,2}}})
Closest candidates are:
  rmul!(::SparseArrays.SparseMatrixCSC, !Matched::Number) at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/SparseArrays/src/linalg.jl:1247
  rmul!(::AbstractArray, !Matched::Number) at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/LinearAlgebra/src/generic.jl:62
  rmul!(::SparseArrays.SparseMatrixCSC, !Matched::LinearAlgebra.Diagonal) at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/SparseArrays/src/linalg.jl:1257
  ...
Stacktrace:
 [1] updateL!(::LinearMixedModel{Float64}) at /Users/liam/.julia/packages/MixedModels/YIBOE/src/pls.jl:153
 [2] #fit!#16(::Bool, ::Nothing, ::Function, ::LinearMixedModel{Float64}) at /Users/liam/.julia/packages/MixedModels/YIBOE/src/pls.jl:205
 [3] fit!(::LinearMixedModel{Float64}) at /Users/liam/.julia/packages/MixedModels/YIBOE/src/pls.jl:174
 [4] top-level scope at none:0
 [5] include at ./boot.jl:326 [inlined]
 [6] include_relative(::Module, ::String) at ./loading.jl:1038
 [7] include(::Module, ::String) at ./sysimg.jl:29
 [8] exec_options(::Base.JLOptions) at ./client.jl:267
 [9] _start() at ./client.jl:436
in expression starting at /Users/liam/Development/TestJuliaVersions/rescaling_issue/run_old_mm.jl:27

Because of that I tried to compare the new and old versions with intercepts just to have a consistent model spec between the two runs. The results from the old model are as follows:

Linear mixed model fit by maximum likelihood
 Formula: y ~ v1 + v2 + v3 + v4 + v5 + (1 | pl3) + ((0 + v1) | pl3) + (1 | pl5) + ((0 + v2) | pl5) + ((0 + v3) | pl5) + ((0 + v4) | pl5) + ((0 + v5) | pl5)
     logLik        -2 logLik          AIC             BIC
 -1.10806429×10⁵  2.21612858×10⁵  2.21640858×10⁵   2.2177812×10⁵

Variance components:
              Column      Variance      Std.Dev.     Corr.
 pl5      (Intercept)  0.000016511126 0.0040633884
          v2           2.332089516651 1.5271180428  0.00
          v3           0.046928422078 0.2166296888  0.00  0.00
          v4           0.235310853852 0.4850885011  0.00  0.00  0.00
          v5           0.161474208675 0.4018385356  0.00  0.00  0.00  0.00
 pl3      (Intercept)  0.000000000000 0.0000000000
          v1           2.203897910864 1.4845531014   NaN
 Residual              2.545362393630 1.5954191906
 Number of obs: 133841; levels of grouping factors: 467, 79

  Fixed-effects parameters:
──────────────────────────────────────────────────────────
                Estimate    Std.Error     z value  P(>|z|)
──────────────────────────────────────────────────────────
(Intercept)  -0.00075639  0.000861024   -0.878477   0.3797
v1           -1.53495     0.181469      -8.45849    <1e-16
v2           -1.29124     0.0923872    -13.9764     <1e-43
v3            0.211161    0.0161327     13.0891     <1e-38
v4            0.926979    0.0664065     13.9592     <1e-43
v5            0.439985    0.0391912     11.2266     <1e-28
──────────────────────────────────────────────────────────

New julia / MM

When I ran against the new versions, I hit the rescaling both times and ended up with erratic coefficients.

[ Info: Initial objective evaluation failed, rescaling initial guess and trying again.
┌ Warning: Failure of the initial evaluation is often indicative of a model specification
│    that is not well supported by the data and/or a poorly scaled model.
└ @ MixedModels ~/.julia/packages/MixedModels/1FpDN/src/linearmixedmodel.jl:472
Minimizing 273   Time: 0:00:02 ( 7.35 ms/it)
  objective:  242986.62402073178
Linear mixed model fit by maximum likelihood
 y ~ 1 + v1 + v2 + v3 + v4 + v5 + (1 | pl3) + (0 + v1 | pl3) + (1 | pl5) + (0 + v2 | pl5) + (0 + v3 | pl5) + (0 + v4 | pl5) + (0 + v5 | pl5)
    logLik     -2 logLik       AIC         AICc          BIC
  396517.3931 -793034.7862 -793006.7862 -793006.7831 -792869.5245

Variance components:
            Column     Variance    Std.Dev.    Corr.
pl5      (Intercept)  0.000000000 0.000016344
         v2           0.000000000 0.000002672   .
         v3           0.000000083 0.000287970   .     .
         v4           0.000000018 0.000133496   .     .     .
         v5           0.000000933 0.000965954   .     .     .     .
pl3      (Intercept)  0.000000121 0.000348053
         v1           0.000000000 0.000015570   .
Residual              0.001317123 0.036292183
 Number of obs: 133841; levels of grouping factors: 467, 79

  Fixed-effects parameters:
──────────────────────────────────────────────────────────
                   Coef.   Std. Error          z  Pr(>|z|)
──────────────────────────────────────────────────────────
(Intercept)  -102.373     0.00583583   -17542.13    <1e-99
v1             -2.58581   0.000288831   -8952.70    <1e-99
v2             -1.1135    0.000251326   -4430.51    <1e-99
v3              0.812915  9.68339e-5     8394.93    <1e-99
v4              4.38554   0.00022599    19405.95    <1e-99
v5            -14.3409    0.000878329  -16327.44    <1e-99
──────────────────────────────────────────────────────────
[ Info: Initial objective evaluation failed, rescaling initial guess and trying again.
┌ Warning: Failure of the initial evaluation is often indicative of a model specification
│    that is not well supported by the data and/or a poorly scaled model.
└ @ MixedModels ~/.julia/packages/MixedModels/1FpDN/src/linearmixedmodel.jl:472
Minimizing 210   Time: 0:00:01 ( 8.18 ms/it)
Linear mixed model fit by maximum likelihood
 y ~ 1 + v1 + v2 + v3 + v4 + v5 + (0 + v1 | pl3) + (0 + v2 | pl5) + (0 + v3 | pl5) + (0 + v4 | pl5) + (0 + v5 | pl5)
     logLik     -2 logLik        AIC           AICc          BIC
   775649.4150 -1551298.8300 -1551274.8300 -1551274.8276 -1551157.1771

Variance components:
         Column  Variance    Std.Dev.    Corr.
pl5      v2  0.000000102 0.000318700
         v3  0.000000015 0.000121865   .
         v4  0.000000022 0.000147820   .     .
         v5  0.000000050 0.000224105   .     .     .
pl3      v1  0.000000124 0.000352745
Residual     0.000004545 0.002131920
 Number of obs: 133841; levels of grouping factors: 467, 79

  Fixed-effects parameters:

────────────────────────────────────────────────────────────
                    Coef.   Std. Error           z  Pr(>|z|)
────────────────────────────────────────────────────────────
(Intercept)     0.0415524  8.82224e-7     47099.59    <1e-99
v1           1924.35       0.00688459    279515.85    <1e-99
v2           -377.472      0.00134463   -280726.18    <1e-99
v3            256.462      0.000916584   279802.01    <1e-99
v4             30.1686     0.000108418   278262.78    <1e-99
v5             55.138      0.000196982   279914.44    <1e-99
────────────────────────────────────────────────────────────

I'm not sure if this i helpful or not but I wanted to at least give a better picture of what we are trying to accomplish and why we are confused about the coefficients that we are getting with the latest models. As always, we really appreciate your help. Please let me or Yue know if there is anything we can do to help debug etc.

palday commented 2 years ago

@liamlundy The method error you ran into is the result of a bugfix in Julia's LinearAlgebra standard library that led to a return type changing in Julia 1.6 compared to older Julia versions (cf #431). Once 1.6 became LTS, we dropped the duplicated methods necessary to support pre 1.6 behavior.

@dmbates I first thought that some of the numerical issues might be arising from the Xy storage we introduced in 4.0, but I can't fit these models on Julia 1.5 and MixedModels 2.4. The 2.x series is about my limit for time travel, partly because it going back to 1.0 would require me to deal with Julia from the 0.6-1.0 transition period, which just a little bit too painful.

I was able to fit this model in lme4:

# Linear mixed model fit by maximum likelihood  ['lmerMod']
# Formula: y ~ v1 + v2 + v3 + v4 + v5 + (1 | pl3) + ((0 + v1) | pl3) + (1 |
#     pl5) + ((0 + v2) | pl5) + ((0 + v3) | pl5) + ((0 + v4) |
#     pl5) + ((0 + v5) | pl5)
#    Data: data
# Weights: data$w

#       AIC       BIC    logLik  deviance  df.resid
#  221640.9  221778.1 -110806.4  221612.9    133827

# Scaled residuals:
#      Min       1Q   Median       3Q      Max
# -13.8622  -0.4886  -0.1377   0.1888  27.0129

# Random effects:
#  Groups   Name        Variance  Std.Dev.
#  pl5      v5          1.615e-01 0.401829
#  pl5.1    v4          2.353e-01 0.485084
#  pl5.2    v3          4.693e-02 0.216635
#  pl5.3    v2          2.331e+00 1.526889
#  pl5.4    (Intercept) 1.651e-05 0.004064
#  pl3      v1          2.206e+00 1.485228
#  pl3.1    (Intercept) 0.000e+00 0.000000
#  Residual             2.545e+00 1.595419
# Number of obs: 133841, groups:  pl5, 467; pl3, 79

# Fixed effects:
#               Estimate Std. Error t value
# (Intercept) -0.0007564  0.0008610  -0.878
# v1          -1.5349996  0.1815460  -8.455
# v2          -1.2912605  0.0923754 -13.978
# v3           0.2111613  0.0161330  13.089
# v4           0.9269805  0.0664061  13.959
# v5           0.4399864  0.0391905  11.227

This gives me similar results to non-mixed/OLS regression:

# y ~ 1 + v1 + v2 + v3 + v4 + v5

# Coefficients:
# ─────────────────────────────────────────────────────────────────────────────────────
#                     Coef.   Std. Error        t  Pr(>|t|)     Lower 95%     Upper 95%
# ─────────────────────────────────────────────────────────────────────────────────────
# (Intercept)  -0.000575762  0.000112393    -5.12    <1e-06  -0.000796048  -0.000355476
# v1           -0.934877     0.00206077   -453.65    <1e-99  -0.938916     -0.930838
# v2           -1.81368      0.00188045   -964.49    <1e-99  -1.81736      -1.80999
# v3            0.160488     0.000510854   314.16    <1e-99   0.159487      0.16149
# v4            1.5533       0.00112932   1375.43    <1e-99   1.55108       1.55551
# v5            1.16306      0.000691772  1681.28    <1e-99   1.16171       1.16442
# ─────────────────────────────────────────────────────────────────────────────────────

Notably, both the MM 1.x fits provided by the OP and the lme4 fits shrink the random intercept for pl3 all the way to zero.

Extracting the parameter vector from lme4 and using it in MixedModels requires permuting a few things (and I'm not sure JellyMe4 would handle this correctly because of the zeroed correlations):

lme4:

  1. pl5-v5
  2. pl5-v4
  3. pl5-v3
  4. pl5-v2
  5. pl5-(Intercept)
  6. pl3-v1
  7. pl3-(Intercept)

Julia:

  1. pl5-(Intercept)
  2. pl5-v2
  3. pl5-v3
  4. pl5-v4
  5. pl5-v5
  6. pl3-=(Intercept)
  7. pl3-v1

Taking this permutation into account, I tried inserting the parameter vector from lme4 into a MixedModels fit (directly updating a model object, not using JellyMe4) and I also got a positive definite error. There's apparently something different in the way the numerical error accumulates there. I'm vaguely curious about the BLAS differences leading to this, but that's a much deeper rabbit hole than I have time for. It might also be related to the sorting of the random effects and how error accumulates and propagates.

I also tried out a few different optimizers to see if we exploring the parameter space in a different way would solve the problem, but it didn't seem to help.

yuexiao-marketingattribution commented 2 years ago

Any updates on this? My understanding is that lme4 gives us the same results as the older version of MixedModels and we should align the two.

palday commented 2 years ago

I want to reiterate and expand upon a few points made by @dmbates.

  1. Your weights span 7 orders of magnitude, which makes the problem numerically extremely difficult and susceptible to things like catastrophic cancellation. Even things like means and sums are difficult to compute reliably over such a span without taking a number of extra steps.
  2. Your groups are already mean centered, which means that the true value of the associated variance components for the intercept is zero. However, you have the random intercepts in your model.
  3. (1) and (2) interact in a peculiar and challenging way. In the model without weights, the software can deal with being close to the boundary of the parameter space, but the rounding error from the huge weights has a catastrophic effect and causes the computed log likelihood to be extremely large (and hence the objective, i.e. -2 log likelihood, to be negative, which isn't a problem per se, but which is often indicative of scaling issues). The values directly at zero however have a smaller albeit correct log likelihood because multiplying through by zero "solves" the rounding error. Because we're ultimately performing maximum likelihood estimation, the values near but not at the boundary are thus "preferred" by the model fitting process, even though we know that those values are erroneous.
  4. The same rounding error that leads to a very large estimated log likelihood also leads to incorrect estimates of the fixed effects. This is apparent when we examine the mean-squared error for a model with zero'd estimates of the variance components and the mean-squared error of the problematic model. The maximum likelihood estimate should have the smallest mean-squared error, but it doesn't because the estimated likelihood is incorrect and thus the software hasn't actually found maximum likelihood estimate.
  5. I looked at a pairs plot of the different predictors. The joint distribution / covariance of the predictors may also contribute to the general numerical difficulties (see below).

image

Let's take a look at how these things culminate in (5). First some setup:

using CSV
using DataFrames
using Downloads
using MixedModels

const CSV_URL = "https://github.com/JuliaStats/MixedModels.jl/files/9649005/data.csv"

data = CSV.read(Downloads.download(CSV_URL), DataFrame)
wts = data[!, :w]
contrasts = Dict(:pl3 => Grouping(), :pl5 => Grouping());

We can force correlations to zero in Julia with zerocorr instead of writing things as separate terms. This is similar to the || syntax in lme4. If we force the correlations under pl3 to zero but allow the correlations under pl5 to be nonzero, then we get not-terrible estimates:

julia> m_zc_pl3 = let f = @formula(y ~ v1 + v2 + v3 + v4 + v5 +
                            zerocorr(1 + v1 | pl3) +
                            (1 + v2 + v3 + v4 + v5 | pl5))

           fit(MixedModel, f, data; wts, contrasts)
       end
[ Info: Initial objective evaluation failed, rescaling initial guess and trying again.
┌ Warning: Failure of the initial evaluation is often indicative of a model specification
│    that is not well supported by the data and/or a poorly scaled model.
└ @ MixedModels ~/.julia/packages/MixedModels/jPOIo/src/linearmixedmodel.jl:481
Minimizing 685   Time: 0:00:01 ( 1.58 ms/it)
  objective:  225921.55359220033
Linear mixed model fit by maximum likelihood
 y ~ 1 + v1 + v2 + v3 + v4 + v5 + MixedModels.ZeroCorr((1 + v1 | pl3)) + (1 + v2 + v3 + v4 + v5 | pl5)
    logLik     -2 logLik       AIC         AICc          BIC     
 -112960.7761  225921.5522  225969.5522  225969.5612  226204.8580

Variance components:
            Column    Variance   Std.Dev.    Corr.
pl5      (Intercept)  0.00000001 0.00008912
         v2           2.25735611 1.50245003 -1.00
         v3           0.04660433 0.21588037 +0.17 -0.17
         v4           0.24307148 0.49302280 +0.32 -0.32 +0.50
         v5           0.17010131 0.41243341 +0.33 -0.33 +0.23 +0.84
pl3      (Intercept)  0.00000001 0.00012036
         v1           0.00000000 0.00001888   .  
Residual              2.63645886 1.62371760
 Number of obs: 133841; levels of grouping factors: 467, 79

  Fixed-effects parameters:
────────────────────────────────────────────────────────
                    Coef.   Std. Error       z  Pr(>|z|)
────────────────────────────────────────────────────────
(Intercept)  -0.000738517  0.000663529   -1.11    0.2657
v1           -0.849216     0.0124344    -68.30    <1e-99
v2           -1.20574      0.0910417    -13.24    <1e-39
v3            0.208185     0.0160558     12.97    <1e-37
v4            0.615806     0.0524811     11.73    <1e-31
v5            0.397616     0.038537      10.32    <1e-24
────────────────────────────────────────────────────────

We can also consider a model without the random intercept under pl3 and this also gives a sensible fit:

julia> m_no_int_pl3 = let f = @formula(y ~ v1 + v2 + v3 + v4 + v5 +
                            (0 + v1 | pl3) +
                            (1 + v2 + v3 + v4 + v5 | pl5))

           fit(MixedModel, f, data; wts, contrasts)
       end
[ Info: Initial objective evaluation failed, rescaling initial guess and trying again.
┌ Warning: Failure of the initial evaluation is often indicative of a model specification
│    that is not well supported by the data and/or a poorly scaled model.
└ @ MixedModels ~/.julia/packages/MixedModels/jPOIo/src/linearmixedmodel.jl:481
Minimizing 1047  Time: 0 Time: 0:00:01 ( 1.39 ms/it)
  objective:  225921.55369228724
Linear mixed model fit by maximum likelihood
 y ~ 1 + v1 + v2 + v3 + v4 + v5 + (0 + v1 | pl3) + (1 + v2 + v3 + v4 + v5 | pl5)
    logLik     -2 logLik       AIC         AICc          BIC     
 -112960.7658  225921.5316  225967.5316  225967.5398  226193.0329

Variance components:
            Column    Variance   Std.Dev.    Corr.
pl5      (Intercept)  0.00000001 0.00009248
         v2           2.25715366 1.50238266 -1.00
         v3           0.04662557 0.21592954 +0.18 -0.17
         v4           0.24327977 0.49323399 +0.32 -0.32 +0.50
         v5           0.17011576 0.41245092 +0.33 -0.33 +0.23 +0.84
pl3      v1           0.00000000 0.00003390
Residual              2.63645836 1.62371745
 Number of obs: 133841; levels of grouping factors: 467, 79

  Fixed-effects parameters:
───────────────────────────────────────────────────────
                   Coef.   Std. Error       z  Pr(>|z|)
───────────────────────────────────────────────────────
(Intercept)  -0.00073551  0.000662218   -1.11    0.2667
v1           -0.849216    0.0124344    -68.30    <1e-99
v2           -1.20574     0.0910375    -13.24    <1e-39
v3            0.20818     0.0160583     12.96    <1e-37
v4            0.615521    0.0524845     11.73    <1e-31
v5            0.39759     0.0385371     10.32    <1e-24
───────────────────────────────────────────────────────

Technically, these are nested models, so we can compute a likelihood ratio test, but the rounding error and boundary effects also show up here:

julia> MixedModels.likelihoodratiotest(m_zc_pl3, m_no_int_pl3)
Model Formulae
1: y ~ 1 + v1 + v2 + v3 + v4 + v5 + (0 + v1 | pl3) + (1 + v2 + v3 + v4 + v5 | pl5)
2: y ~ 1 + v1 + v2 + v3 + v4 + v5 + MixedModels.ZeroCorr((1 + v1 | pl3)) + (1 + v2 + v3 + v4 + v5 | pl5)
────────────────────────────────────────────────────
     model-dof    -2 logLik       χ²  χ²-dof  P(>χ²)
────────────────────────────────────────────────────
[1]         23  225921.5316                         
[2]         24  225921.5522  -0.0206       1     NaN
────────────────────────────────────────────────────

The more complex model is supposedly a worse fit, so we can a NaN for the result, but we should actually just interpret this as "no difference within floating point tolerance" and p=1. In other words, we favor the model without a random intercept for pl3. This make sense -- we know the true value to be zero.

Interestingly, I was unable to fit these models with zerocorr for pl5. I this this because without the correlation parameters, the optimizer gets stuck in the region near the edge of the parameter space where rounding error leads to incorrect values for log likelihood.

All of this is variations on a common theme: a lot of things that are well defined in theory are not well defined in numerical practice and part of data analysis is learning the limits of computation.

Now, you may be asking why lme4 and earlier version of MixedModels did not run into these problems. I have a few speculations:

  1. lme4 orders the random effects differently and this changes the order of certain sums, which can have implications for rounding error in ill-conditioned problems
  2. MixedModels.jl has had a few algorithmic advances over the years that generally improve the speed of the evaluation of the log likelihood. Generally we would expect this to have better numerical stability, but depending on a number of factors related to the way linear algebra is performed on modern CPUs, this may lead to rounding errors appearing in different places than in lme4. Perhaps that difference is all it takes.
  3. One of the algorithmic improvements is related to (1): MixedModels.jl doesn't need to create fake groups to suppress the correlations between terms like lme4 does. (This is visible in the .n suffixes in the lme4 output of the random effects.)

As I was writing this up, (3) occurred to me and I decided to check out what happens if we force MixedModels.jl to do things like lme4 does. This isn't exactly trivial because MixedModels.jl really wants to combine random effects with the same grouping variable into a single term with an appropriately zeroed correlation matrix. But we can do this by creating new grouping variables that are actually just copies of the original:

select!(data, :,
        :pl3 => :pl3a,
        :pl3 => :pl3b,
        :pl5 => :pl5a,
        :pl5 => :pl5b,
        :pl5 => :pl5c,
        :pl5 => :pl5d,
        :pl5 => :pl5e)

We also need to add in the relevant contrasts:

contrasts = merge(contrasts, Dict(:pl3a => Grouping(),
                                  :pl3b => Grouping(),
                                  :pl5a => Grouping(),
                                  :pl5b => Grouping(),
                                  :pl5c => Grouping(),
                                  :pl5d => Grouping(),
                                  :pl5e => Grouping()))

Finally, we need to define a few extra matrix multiplication methods. It turns out that we've forced the internal representation to be a little weird and so we don't have some of the relevant specializations:

using LinearAlgebra
MixedModels.rmulΛ!(A::Diagonal{T}, B::ReMat{T,1}) where {T} = rmul!(A, only(B.λ))
function MixedModels.rankUpdate!(C::Hermitian{T, Diagonal{T, Vector{T}}}, A::Diagonal{T, Vector{T}}, α, β) where {T}
    size(C) == size(A) || throw(DimensionMismatch("Diagonal matrices unequal size"))
    C.data.diag .*= β
    C.data.diag .+= α .* abs2.(A.diag)
    return C
end

Now we fit the model:

julia> m_form_split = let f = @formula(y ~ v1 + v2 + v3 + v4 + v5 +
                                       (1 | pl3a) + ((0 + v1) | pl3b) +
                                       (1 | pl5a) + ((0 + v2) | pl5b) +
                                       ((0 + v3) | pl5c) + ((0 + v4) | pl5d) +
                                       ((0 + v5) | pl5e))
           fit(MixedModel, f, data; wts, contrasts)
       end
Minimizing 1402  Time: 0 Time: 0:00:37 (26.52 ms/it)
Linear mixed model fit by maximum likelihood
 y ~ 1 + v1 + v2 + v3 + v4 + v5 + (1 | pl3a) + (0 + v1 | pl3b) + (1 | pl5a) + (0 + v2 | pl5b) + (0 + v3 | pl5c) + (0 + v4 | pl5d) + (0 + v5 | pl5e)
    logLik     -2 logLik       AIC         AICc          BIC     
 -110806.4290  221612.8581  221640.8581  221640.8612  221778.1198

Variance components:
            Column    Variance   Std.Dev.  
pl5a     (Intercept)  0.00001651 0.00406315
pl5b     v2           2.33149835 1.52692447
pl5c     v3           0.04692059 0.21661162
pl5d     v4           0.23529473 0.48507189
pl5e     v5           0.16156129 0.40194688
pl3a     (Intercept)  0.00000000 0.00000000
pl3b     v1           2.20171276 1.48381696
Residual              2.54536460 1.59541988
 Number of obs: 133841; levels of grouping factors: 467, 467, 467, 467, 467, 79, 79

  Fixed-effects parameters:
────────────────────────────────────────────────────────
                    Coef.   Std. Error       z  Pr(>|z|)
────────────────────────────────────────────────────────
(Intercept)  -0.000756406  0.000861013   -0.88    0.3797
v1           -1.5349       0.181384      -8.46    <1e-16
v2           -1.29126      0.0923772    -13.98    <1e-43
v3            0.211162     0.0161316     13.09    <1e-38
v4            0.926982     0.0664047     13.96    <1e-43
v5            0.439968     0.0392001     11.22    <1e-28
────────────────────────────────────────────────────────

which gives an identical fit to lme4. Note that this took several hundred more iterations to fit than the more complex model (with a correlation structure for pl5) above and took nearly 20x as long per iteration.

The more compact representation for suppressed correlations was introduced early in the 2.x release series (September 2019), so this also explains why things worked under 1.x but not under any recent release.

@dmbates this is perhaps interesting both because I found a model where we hit our fallback rankUpdate! method and because I found a case where amalgamate causes problems.

kliegl commented 2 years ago

This is fantastic.

yuexiao-marketingattribution commented 2 years ago

This starts to make a lot of sense. Our experience testing the code suggests (1) resort to rescaling happens a lot more frequent when we have more than one interaction level; (2) Keeping the random effect of intercept contributes to the issues; (3) Newer computer chips also contribute to coefficient discrepancies occasionally.

I understand the performance implications of a fix to this issue and wonder (1) would it be possible to detect the extreme likelihood values and only invoke the fix in those cases? (2) Would it be helpful to avoid the problem by specifying the model similar to your test before we pass the data to fit the model? Or it is something that can be embedded in the package.

Thanks a lot for continued effort on the issue. It is greatly appreciated.

yuexiao-marketingattribution commented 2 years ago

So we did a bunch of tests using the same data. Here is a summary of everything we observed -

  1. We tried to replicate @palday 's suggestion to split the interaction term and force suppress correlation. Depending on the machine we used, it took 11-30 mins to solve using Julia 1.6 / MixedModels (current master code), rather than 37s as @palday's output suggests. The solution does come very close to what Julia 1.1/MixedModels 1.x and R got but given the performance, we can't go with this solution.

  2. We tried to reduce the order of magnitude of the weight variable in two different ways: (1) set all weights below 1 to be 1, and all weights above 1000 to be 1000, respectively. (2) took the square root of weight to shrink its spread; (In both cases, if I keep the original model formula, i.e., with random intercept and suppressed correlation,) I would continue to get wrong fixed coefficients in the latest Julia and MixedModels.)

  3. So I tried to allow unsuppressed correlation, and/or reduced spread of weights(using sqrt of the original weight variable, labeled as wts_2 in the attached data), and/or removed random intercept when estimation converges) as these are supposedly the culprits of the estimation issue in the latest code version. R and Julia 1.1/MixedModels 1.1.7 gives me the very close fixed coefficient estimates. In Julia 1.6/MixedModels 4.8, It will solve but the coefficients are different enough it affects the interpretation. (If I suppress the correlation among random coefficients following @palday's suggestions, I could get the same coefficients. But the latter took 8 mins on my machine, rather than the 18s in Julia 1.1. I did notice with the suppressed weight variable, R doesn't give me the near singularity warning any more. Below is a comparison of the fixed coefficients. I attached the data and more detailed summary at the end.

image

It looks like for our purposes, MixedModels 2.x and later are not stable enough. We will hold off the transition until this can be resolved. Thanks a lot for your help so far.

julia version comparison 10-30.xlsx data_4.csv

palday commented 2 years ago

@yuexiao-marketingattribution I'm a little bit surprised by the performance you've found -- 11-30 minutes is an order of magnitude slower than what I found on any my personal devices. Can you provide the output of versioninfo() from a typical machine?

yuexiao-marketingattribution commented 2 years ago

Below is the specs of my machine. @liamlundy 's machine uses newer chips and tends to use less time than mine.

Julia Version 1.6.5 Commit 9058264a69 (2021-12-19 12:30 UTC) Platform Info: OS: macOS (x86_64-apple-darwin19.6.0) CPU: Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz WORD_SIZE: 64 LIBM: libopenlibm LLVM: libLLVM-11.0.1 (ORCJIT, icelake-client) Environment: JULIA_PROJECT = [local-path]

kliegl commented 2 years ago

Just curious: Why are you not using the current Julia Version 1.8.2 (and all the updates coming with it)?

yuexiao-marketingattribution commented 2 years ago

Not sure how much difference this will make but we could try. We started this effort to upgrade julia when 1.6.5 was the latest version. The general issue with MixedModels, however, seems start when MixedModes went from 1.1.7 to 2.0.0.

kliegl commented 2 years ago

Also my interpretation of Phil's reconstruction of the lme4 solution by putting MixedModels.jl in a straight jacket was that the lme4 solution is for various reasons, say, spurious and should not be used as a platform for your work. I am happy to stand corrected if I misunderstood.

palday commented 2 years ago

@kliegl I don't know if MixedModels 1.x will run on Julia 1.6+. There was a change in the LinearAlgebra stdlib in Julia 1.6 that changed the type of a particular matrix, so we had to change the methods we used. We never backported the changed methods to older releases.

kliegl commented 2 years ago

@palday (Why) would it make sense in general to go back to MixedModels.jl 1.x?

yuexiao-marketingattribution commented 2 years ago

A few notes from my end:

  1. I can confirm MixedModels 1.x won't run on Julia 1.6+.
  2. I understand that lme4 won't necessarily be the benchmark. What puzzles me is: lme4 does give out warnings of near singularity with the original weights variable. It doesn't show the near singularity warning for the model with shrinked weights any more. Yet its solution from this model is too different from the solution in MixedModels 4.8 to be rounding error.
palday commented 2 years ago

Also my interpretation of Phil's reconstruction of the lme4 solution by putting MixedModels.jl in a straight jacket was that the lme4 solution is for various reasons, say, spurious and should not be used as a platform for your work. I am happy to stand corrected if I misunderstood.

This is pretty much correct. Our stance is that we're hitting the boundaries of the software because the modelling problem here is numerically ill-conditioned, which may reflect an underlying mismatch between the desired inference and the data. We want to support as many models as possible and provide better hints about when and how things may fail, but there is a fundamental limit to automated checks.

There is also a very real tradeoff for my limited development time -- this is a hobby project for me. Even if implementing a fix for a particular situation is relatively straightforward, it increases the maintenance burden both by introducing more code and increasing the number of corner cases we have to check.

I have actually mocked up what it would it would take to force separate terms to remain separate. It turns out that this violates assumptions elsewhere in the code and requires changing a surprising number of lines: kwarg for optional amalgmate. Because those lines also involve output behavior, it also means that there would be a lot of tests to write.

I suspect there are better ways to solve this by changing more fundamental aspects of the algorithm when weights are present, but those also require time to research and think about.

One alternative would be using higher precision numbers (so something besides double precision/Float64). MixedModels.jl is itself written generically enough to handle this (this is the role of the type parameter in LinearMixedModel), but the nonlinear optimizer we use in the back end does not support anything besides double precision: We use NLopt, which provides a number of optimization algorithms, but requires double precision for all of them. I've long wanted to try supporting the MathOptInterface but that would require changing some of the oldest and most performance critical parts of the codebase and would definitely be a breaking change. It might be possible to modify the computation to use higher precision for intermediate steps in a non-breaking way, but this would definitely incur a performance penalty.

@palday (Why) would it make sense in general to go back to MixedModels.jl 1.x?

I only became heavily involved with MixedModels development shortly after the 2.0 release, so I am not really familiar with the internals from back then. However, I did fit a number of models using the 1.x releases for my own research. To the best of my knowledge, the linear mixed model code there is correct, although generally much more feature limited than later releases. (There are also compatibility issues with other packages in the Julia ecosystem.) So if you are able to fit the linear mixed model you need with 1.x, go for it. However, I would not fully trust generalized linear mixed models from the 1.x series -- finding a few issues in that code is actually how I became involved with MixedModels.jl.