Closed sindresops closed 1 year ago
Do you have an example you can share?
Apologies. I was being lazy. I have updated the original post with a minimum working example.
Ah wow that's a surprise to me! IMO the earlier behavior was a bug, since we (generally) sort terms by degree even in pre-0.7. In fact, I can't reproduce your example on my machine:
julia> using Pkg; using DataFrames; Pkg.add(name="StatsModels",version="0.6.33"); using GLM; x = collect(1:10); y = 2x .+ randn(length(x)); lm(@formula(y~x+1),DataFrame(x=x,y=y))
Resolving package versions...
No Changes to `/private/var/folders/kg/y0c0ksr56_g800hjvcpx_d4w0000gp/T/jl_ipptqs/Project.toml`
No Changes to `/private/var/folders/kg/y0c0ksr56_g800hjvcpx_d4w0000gp/T/jl_ipptqs/Manifest.toml`
[ Info: Precompiling GLM [38e38edf-8417-5370-95a0-9cbb8c7f171a]
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Matrix{Float64}}
y ~ 1 + x
Coefficients:
─────────────────────────────────────────────────────────────────────────
Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95%
─────────────────────────────────────────────────────────────────────────
(Intercept) -0.259833 0.85121 -0.31 0.7680 -2.22273 1.70306
x 2.02703 0.137185 14.78 <1e-06 1.71068 2.34338
─────────────────────────────────────────────────────────────────────────
(jl_ipptqs) pkg> st
Status `/private/var/folders/kg/y0c0ksr56_g800hjvcpx_d4w0000gp/T/jl_ipptqs/Project.toml`
[a93c6f00] DataFrames v1.5.0
[38e38edf] GLM v1.8.2
⌃ [3eaba693] StatsModels v0.6.33
Info Packages marked with ⌃ have new versions available and may be upgradable.
Ah wow wait nevermind, I misread the report. This is definitely a bug!
I think the issue is that the constant term is incorrectly assigned the same degree as x
. Sorting works correctly with an interaction term like this:
julia> f2 = @formula(y ~ x & z + x + 1)
FormulaTerm
Response:
y(unknown)
Predictors:
x(unknown)
1
x(unknown) & z(unknown)
(jl_QKrdLJ) pkg> st
Status `/private/var/folders/kg/y0c0ksr56_g800hjvcpx_d4w0000gp/T/jl_QKrdLJ/Project.toml`
[a93c6f00] DataFrames v1.5.0
[38e38edf] GLM v1.8.2
[3eaba693] StatsModels v0.7.0
@sindresops this is fixed on master now and will be released as StatsModels 0.7.1: https://github.com/JuliaRegistries/General/pull/81005
Thanks for the report!
Thanks for patching!
My code was breaking because StatsModels.jl has changed the default ordering of FormulaTerm. (Intercept) used to come first, now its last. I was dependent on GLM.coef() returning the regression coefficients in that specific order. The new change is good, since now the higher order terms come first. But just an FYI in case anyone else experiences the same.
Edit 1: (Added MWE)
using Pkg; using DataFrames; Pkg.add(name="StatsModels",version="0.6.33"); using GLM; x = collect(1:10); y = 2x .+ randn(length(x)); lm(@formula(y~x+1),DataFrame(x=x,y=y));
Returns
using Pkg; using DataFrames; Pkg.add(name="StatsModels",version="0.7"); using GLM; x = collect(1:10); y = 2x .+ randn(length(x)); lm(@formula(y~x+1),DataFrame(x=x,y=y));
Returns
Other info Julia version 1.8.0 GLM v1.8.2