JuliaAI / MLJLinearModels.jl

Generalized Linear Regressions Models (penalized regressions, robust regressions, ...)
MIT License
80 stars 13 forks source link

Feature names for LinearRegressor fitted_params #60

Closed venuur closed 3 years ago

venuur commented 4 years ago

Currently, the fitted parameters (coefficient and intercept) for LinearRegressor does not include feature names, which makes mapping coefficients back to their input columns complicated when combined with transformations such as OneHotEncoder. E.g.

julia> fps = fitted_params(selected_model)

julia> fps.machines
3-element Array{Any,1}:
 NodalMachine{UnivariateStandardizer} @ 5…02
 NodalMachine{LinearRegressor} @ 6…58
 NodalMachine{OneHotEncoder} @ 8…55

julia> fps.fitted_params_given_machine[fps.machines[2]]
(coefs = [XXX],
 intercept = XXX,)

I think this would be a useful diagnostic feature to have something like

julia> fps.fitted_params_given_machine[fps.machines[2]]
(coefs = [XXX],
 intercept = XXX,
 names = [XXX])
aviatesk commented 4 years ago

Hi, I'm also interested in this. If we have:

using MLJ, MLJLinearModels

data = DataFrame(A = rand(10000), B = categorical(rand((1:5),10000)))
label = categorical(rand((1,2,3), 10000))
lr = @pipeline LogRegPipe(
    encode = OneHotEncoder(),
    lr = MultinomialClassifier()
) prediction_type = :probabilistic
m = machine(lr, data, label)
MLJ.fit!(m)

ms, params = fitted_params(m)
coefs = params[ms[1]][1] # 6 x 3 array

is my understanding correct that then coefs would be like ?

7×3 Array{Float64,2}:
  0.00883906   0.0285588   -0.0373979  # <- intercept
  0.0259289   -0.0593046    0.0333757  # <- A
 -0.047085    -0.0247367    0.0718217  # <- B__1
  0.019338     0.0152594   -0.0345974  # <- B__2
  0.0185169    0.0486821   -0.0671989  # <- B__3
 -0.0152293    0.0186303   -0.00340103 # <- B__4
  0.00146942  -0.00146942  -0.00845713 # <- B__5

If you could tell me this, I think I can work on this to upstream into this package.

/cc @tlienart @ablaom

tlienart commented 4 years ago

the last row = intercept, so shift everything up once and put intercept on the last row and that's correct.

PS: I'm aware there are things missing in MLJLinearModels it's more a matter of finding the time to do it...

aviatesk commented 4 years ago

thanks for the quick reply, okay.

it's more a matter of finding the time to do it...

hehe, I see. Do you want to implement this by yourself ? otherwise I may try to submit a PR about this if I have time next week.

tlienart commented 4 years ago

a PR would be fantastic even a draft, I'll make sure to react quickly for the review etc.