Closed Nosferican closed 5 years ago
Bump. I was waiting on trying to understand this issue, but if the release might happen before that gets addressed I can add one PR for fit(::LinearMixedModel, formula, data, args..., kw...)
.
Now that the v2.0.0 release has occurred, would you like to revisit this issue?
It is a real problem to try to fit a linear model or linear mixed model as a generalized linear model or a generalized linear mixed model because the direct calculation of the least squares or penalized least squares solution is phrased as a iterative calculation. Evaluation of convergence criteria, etc. can get tricky.
One workaround is to catch attempts to fit a GLM or GLMM with the Normal() family and its canonical Identity() link and fit it as a LM or LMM. That is unsatisfactory because you shouldn't start with a GeneralizedLinearMixedModel
object, invoke fit!
on it, and return a LinearMixedModel
object.
Another workaround is to return an error that effectively says "Don't do that" and refer the user to LinearMixedModel
. That tends to make the user a bit grumpy.
Another approach is to change the fit! method to catch the combination of Normal()
and IdentityLink
, fit the corresponding LinearMixedModel
(a GeneralizedLinearMixedModel
actually contains a LinearMixedModel
structure) and dress it up as if it had been fit as a GLMM. I would probably also warn the user that this is a silly way of doing things.
I think I like the third approach the best. I agree with the assessment of the first and second approaches.
Would it be possible to have the GeneralizedLinearMixedModel
fields with sensible values given that it is a linear mixed model? Doing a warning is fine. I guess since there isn't a straight way of treating the linear model as a generalized linear model, we could do a minor or patch version to support
fit(LinearMixedModel, formula, data, args..., kw...)
I can make the PR for it.
What I was thinking of doing is supporting
fit!(GeneralizedLinearMixedModel(formula, data, Normal()))
by detecting that case in the constructor for GeneralizedLinearMixedModel
and short-circuiting the fit!
procedure. It would be necessary to fill out the fields in the GLMM struct but there is no need to worry about using a two stage optimization.
Let me draft something and you can take a look.
I started to fake a GeneralizedLinearMixedModel
for LinearMixedModel
cases and it got sufficiently complicated and convoluted that I decided to punt and just throw an ArgumentError
for that case. See the LMMasGLMM
branch. Ler me know if you think it is important to patch this up rather than throw an error. I am perhaps less inclined than most to try to produce a result for a user, no matter how wrong-headed the request may be.
How about providing a generic dispatch for returning the appropriate model based on the model characteristics? For example,
MixedModel(formula, data, args..., kw...)
would return either a GeneralizedLinearMixedModel
or a LinearMixedModel
depending on whether the distribution is Normal
with canonical link? I took a similar approach in Econometrics.jl (see example based on response type).
That way, users doesn't get the drawback of the first approach (i.e., calling one type struct and getting a different one), we can implement an error for "wrong-headed" approaches, but it would be a gentle push towards if you want this kind of behavior, use the generic approach.
That's a good idea. I just committed a method for fit(MixedModel, f::FormulaTerm, dat, d::Distribution=Normal(); kw...) which seems to work for both LinearMixedModel and GeneralizedLinearMixedModel. Take a look at the changes in the tests.
On Fri, Aug 9, 2019 at 4:43 AM José Bayoán Santiago Calderón < notifications@github.com> wrote:
How about providing a generic dispatch for returning the appropriate model based on the model characteristics? For example,
MixedModel(formula, data, args..., kw...)
would return either a GeneralizedLinearMixedModel or a LinearMixedModel depending on whether the distribution is Normal with canonical link? I took a similar approach in Econometrics.jl (see example https://nosferican.github.io/Econometrics.jl/dev/estimators/#Ordinal-Response-Models-1 based on response type).
That way, users doesn't get the drawback of the first approach (i.e., calling one type struct and getting a different one), we can implement an error for "wrong-headed" approaches, but it would be a gentle push towards if you want this kind of behavior, use the generic approach.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dmbates/MixedModels.jl/issues/185?email_source=notifications&email_token=AAC2UOXW4J3MW4ARXMZCMM3QDU32JA5CNFSM4IITKT2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD36FZTI#issuecomment-519855309, or mute the thread https://github.com/notifications/unsubscribe-auth/AAC2UOSJXSVPI6HS7N4U2Y3QDU32JANCNFSM4IITKT2A .
@Nosferican How does the code in the LMMasGLMM branch look?
It looks good. I am going to add something to the docs about the feature and add any missing fit
method. I can commit to it and open the PR.
@Nosferican I have given you write permission on the repository if you want to work on this repository instead of on your own fork.
A few things I found,
LinearMixedModel
and GeneralizedLinearMixedModel
.
hints
is a positional argument in LinearMixedModel
and keyword argument in GeneralizedLinearMixedModel
.LinearMixedModel
has weights as weights
and GeneralizedLinearMixedModel
has wt
GeneralizedLinearMixedModel
have a Normal
distribution with a non canonical link? If so, the current branch would not support it as such.I am leaning towards making all arguments keyword arguments except for the Distribution
/Link
.
It is possible to use a Normal distribution with a non-Identity link although it is more of a curiosity than a common occurrence. Cook and Weisberg used a Normal distribution and an inverse link to fit the Michaelis-Menten model, Vm*c/(K+c)
where Vm and K are the parameters to be fit and c
, the concentration, is a covariate, in one of their books but I haven't seen other cases.
I agree that it would be best to standardize the arguments. I noticed weights
vs wt
myself. I guess the question is whether such a change would require a major version bump. I think going to v3.0.0 so soon after v2.0.0 would be exaggerating the pace of development. In some ways these changes are simply fixing "infelicities" in the original code. Is it possible to deprecate the original arguments, like wt
and what does that entail for semantic versioning?
I just saw something else that will eventually need to be addressed. The BlockArrays
package changed after v0.7.0 to require the underlying arrays to be the same type. This package uses heterogeneous BlockArrays
as shown by the output of describeblocks
and the version of BlockArrays
must be capped at v0.7.0.
Only a small portion of the capabilities of BlockArrays are used for A
and L
in a LinearMixedModel
. It is probably best to remove the dependency on BlockArrays
and write custom code to organize and access the blocks.
I guess it would be proper to handle the Normal distribution and canonical links as I think canonical links are more of exceptions rather than the rule (maybe Probit being the most widely used non-canonical link). I did thought about the implication of breaking major. On that issue, technically nothing in the docs and public API had documented the weights
with LinearMixedModel
and in one instance it actually gets ignored (which would be considered a bug). We could do a deprecation stage just in case it breaks anyone's code, but it might also be fair game since it wasn't explicitly documented and would only affect the v2.0.0
users. If treated as a non public API feature it could be a minor version as it gets new features or patch if it was meant to be the same name as the GeneralizedLinearModel
, but wasn't.
As for the BlockArrays, it seems like capping it with BlockArrays = "0.7"
should be good enough for now. That could be a patch release.
@Nosferican I think you meant "non-canonical links are more the exception than the rule", in which case I agree
I guess it would be proper to handle the Normal distribution and canonical links as I think canonical links are more of exceptions rather than the rule (maybe Probit being the most widely used non-canonical link). I did thought about the implication of breaking major. On that issue, technically nothing in the docs and public API had documented the
weights
withLinearMixedModel
and in one instance it actually gets ignored (which would be considered a bug). We could do a deprecation stage just in case it breaks anyone's code, but it might also be fair game since it wasn't explicitly documented and would only affect thev2.0.0
users. If treated as a non public API feature it could be a minor version as it gets new features or patch if it was meant to be the same name as theGeneralizedLinearModel
, but wasn't.
As for the BlockArrays, it seems like capping it with
BlockArrays = "0.7"
should be good enough for now. That could be a patch release.
It's already the case that BlockArrays
is capped at "0.7". I'm concerned that at some point other packages may require later versions of BlockArrays
creating version resolution problems. It seems that BlockArrays
is going to stay with homogeneous array types so the code here should be modified to remove the dependency. All that is necessary is to create Matrix of AbstractMatrix elements for A
and L
.
Good point. It seems our dependencies might not use it, but it could be that some application uses MixedModels and BlockArrays.
I am leaning towards dispatching between LinearMixedModel
and GeneralizedLinearMixedModel
on whether there is a positional argument with a Distribution
. That would be the distinction while it supports the explicit constructors as well. When using the explicit, the distribution positional argument will default to Normal()
as it does currently.
I just pushed a commit to the LMMasGLMM
branch with this type of logic. There are warnings during precompilation
julia> using MixedModels
[ Info: Recompiling stale cache file /home/bates/.julia/compiled/v1.2/MixedModels/tBiYK.ji for MixedModels [ff71e718-51f3-5ec2-a782-8ffcbfa3c316]
WARNING: Method definition fit(Type{MixedModels.MixedModel{T} where T}, StatsModels.FormulaTerm{L, R} where R where L, Any) in module MixedModels at /home/bates/.julia/dev/MixedModels/src/mixed.jl:3 overwritten at /home/bates/.julia/dev/MixedModels/src/mixed.jl:3.
** incremental compilation may be fatally broken for this module **
WARNING: Method definition fit(Type{MixedModels.MixedModel{T} where T}, StatsModels.FormulaTerm{L, R} where R where L, Any, Distributions.Distribution{F, S} where S<:Distributions.ValueSupport where F<:Distributions.VariateForm) in module MixedModels at /home/bates/.julia/dev/MixedModels/src/mixed.jl:3 overwritten at /home/bates/.julia/dev/MixedModels/src/mixed.jl:3.
** incremental compilation may be fatally broken for this module **
WARNING: Method definition #fit(Any, typeof(StatsBase.fit), Type{MixedModels.MixedModel{T} where T}, StatsModels.FormulaTerm{L, R} where R where L, Any) in module MixedModels overwritten.
** incremental compilation may be fatally broken for this module **
WARNING: Method definition #fit(Any, typeof(StatsBase.fit), Type{MixedModels.MixedModel{T} where T}, StatsModels.FormulaTerm{L, R} where R where L, Any, Distributions.Distribution{F, S} where S<:Distributions.ValueSupport where F<:Distributions.VariateForm) in module MixedModels overwritten.
** incremental compilation may be fatally broken for this module **
which I haven't worked out. Does this seem workable to you?
One of the problems with the methods may be the untyped tbl
argument. Would it work better if it was {<:Tables.Table}
in the method signature?
StatsBase.fit(::Type{LinearMixedModel}, f::FormulaTerm, tbl; kwargs...) =
fit(::Type{LinearMixedModel}, f::FormulaTerm, Tables.columntable(tbl), kwargs...)
StatsBase.fit(::Type{LinearMixedModel},
f::FormulaTerm,
tbl::Tables.ColumnTable;
kwargs...)
I think we should always have the main method work with tbl::Tables.ColumnTable
.
At least with current versions of DataFrames
and Tables
julia> DataFrame <: Tables.ColumnTable
false
Per current behavior when the functions call Tables.columntable
,
julia> data = DataFrame(a = 1:3, b = 2:4) |> Tables.columntable |> typeof
NamedTuple{(:a, :b),Tuple{Array{Int64,1},Array{Int64,1}}}
So for design would this work?
# LinearMixedModel
fit(MixedModel, f::FormulaTerm, tbl::Tables.ColumnTable;
wt, contrasts, verbose, REML)
# GeneralizedLinearMixedModel
fit(MixedModel, f::FormulaTerm, tbl::Tables.ColumnTable, d::Distribution,
l::AbstractLink = canonicallink(d);
wt, contrasts, offset, verbose, fast, nAGQ)
# LinearMixedModel
fit(MixedModel, f::FormulaTerm, tbl::Tables.ColumnTable, d::Normal, l::IdenitityLink;
wt, contrasts, verbose, REML,
offset, fast, nAGQ # These would be ignored
)
I am not sure whether being explicit about the ColumnTable
has a big effect on performance or consistency. In Econometrics.jl, I kept it as any and just assume it complies with the Tables.jl API.
I tried doing a constructor with MixedModel
, but had some issues with the {T}
so I skipped that for now and worked on the fit(::Type{MixedModel},...)
.
I pushed a couple of changes to the LMMasGLMM
to fix a failing test and to re-export some of the contrasts coding functions from StatsModels
.
Sorry for the delay, I have pushed some changes to LMMasGLMM
based on what we talked.
Closed by #186
Thanks for closing these, @Nosferican . I am ready to make a new release. I just cleaned up some of the dependencies in the main Project.toml
file and dropped the Appveyor checks for 32-bit. Is it okay with you to make a new release? I have it tagged as v2.0.1. Do you think it should be v2.1.0 because the recommended way of fitting models has changed?
Sure. I think since it is a new feature I would go for a minor release rather a patch.
Okay. Anything else you want to put in? I would like to push this out soon so I can have a bit of time if problems arise after it hits the repository. I will be offline starting this weekend for a few days.
Sounds good. You might want to check the Normal and non-canonical link combinations which seem to be broken in some cases, but that can be a patch release whenever.
Example data MixedModels is master.
is that intended?