JuliaStats / Lasso.jl

Lasso/Elastic Net linear and generalized linear models
Other
143 stars 31 forks source link

Model type LassoModel doesn't support intercept #74

Open ForceBru opened 1 year ago

ForceBru commented 1 year ago

Code that doesn't work

julia> using DataFrames, Lasso

julia> df = DataFrame(x=randn(100), y=3randn(100) .+ 1);

julia> fit(LassoModel, @formula(x ~ 1 + y), df)
ERROR: ArgumentError: Model type LassoModel doesn't support intercept specified in formula x ~ 1 + y
Stacktrace:
 [1] apply_schema(t::FormulaTerm{Term, Tuple{ConstantTerm{Int64}, Term}}, schema::StatsModels.Schema, Mod::Type{LassoModel})
   @ StatsModels ~/.julia/packages/StatsModels/fK0P3/src/schema.jl:288
 [2] ModelFrame(f::FormulaTerm{Term, Tuple{ConstantTerm{Int64}, Term}}, data::NamedTuple{(:x, :y), Tuple{Vector{Float64}, Vector{Float64}}}; model::Type{LassoModel}, contrasts::Dict{Symbol, Any})
   @ StatsModels ~/.julia/packages/StatsModels/fK0P3/src/modelframe.jl:84
 [3] kwcall(::NamedTuple{(:model, :contrasts), Tuple{UnionAll, Dict{Symbol, Any}}}, ::Type{ModelFrame}, f::FormulaTerm{Term, Tuple{ConstantTerm{Int64}, Term}}, data::NamedTuple{(:x, :y), Tuple{Vector{Float64}, Vector{Float64}}})
   @ StatsModels ~/.julia/packages/StatsModels/fK0P3/src/modelframe.jl:73
 [4] fit(::Type{LassoModel}, ::FormulaTerm{Term, Tuple{ConstantTerm{Int64}, Term}}, ::DataFrame; contrasts::Dict{Symbol, Any}, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ StatsModels ~/.julia/packages/StatsModels/fK0P3/src/statsmodel.jl:85
 [5] fit(::Type{LassoModel}, ::FormulaTerm{Term, Tuple{ConstantTerm{Int64}, Term}}, ::DataFrame)
   @ StatsModels ~/.julia/packages/StatsModels/fK0P3/src/statsmodel.jl:78
 [6] top-level scope
   @ REPL[7]:1

Why can I not manually specify an intercept like @formula(x ~ 1 + y)? The documentation ?@formula says:

1, 0, and -1 indicate the presence (for 1) or absence (for 0 and -1) of an intercept column.

So 1 is a valid intercept specification, like in R. This @formula also works in GLM.lm.

Code that works

If I write @formula(x ~ y), Lasso.jl will automatically fit a model with an intercept:

julia> fit(LassoModel, @formula(x ~ y), df)
StatsModels.TableRegressionModel{LassoModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, MinAICc}, Matrix{Float64}}

x ~ y

Coefficients:
LassoModel using MinAICc(2) segment of the regularization path.

Coefficients:
──────────────
      Estimate
──────────────
x1  -0.132743
x2   0.0497596
──────────────

I assume the first coefficient is the intercept and the second one is multiplied by y, so the model is:

x = -0.132743 + 0.0497596 * y

So, intercepts are supported, but I can't manually specify that I want an intercept.

More code that doesn't work

Let's fit a model without an intercept. I specify this with the 0 in @formula(x ~ 0 + y).

julia> fit(LassoModel, @formula(x ~ 0 + y), df)
StatsModels.TableRegressionModel{LassoModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, MinAICc}, Matrix{Float64}}

x ~ 0 + y

Coefficients:
LassoModel using MinAICc(2) segment of the regularization path.

Coefficients:
──────────────
      Estimate
──────────────
x1  -0.132743
x2   0.0497596
──────────────

It seems like the package ignored the zero in the formula, fitted an intercept -0.132743 anyway and produced the same model as above, even though the @formula is different. R's glmnet supports fitting without an intercept since 2013.


It would be nice if it were possible to specify the intercept in the formula.

Versions

patrickm663 commented 1 year ago

Hi @ForceBru

When using Lasso.jl, I noticed that to exclude the intercept, it needs to be specified as an argument in fit() as in fit(LassoModel,...; intercept=false) -- rather than in @formula(...) like with GLM.jl. I haven't stepped through the source code to understand why.

I hope this helps.

Regards Patrick