JuliaAI / MLJLinearModels.jl

Generalized Linear Regressions Models (penalized regressions, robust regressions, ...)
MIT License
80 stars 13 forks source link

coef names #61

Closed aviatesk closed 4 years ago

aviatesk commented 4 years ago

addresses https://github.com/alan-turing-institute/MLJLinearModels.jl/issues/60

requires:

/cc @tlienart @ablaom

aviatesk commented 4 years ago

So in order for CIs to pass without any failure, we have to wait those PRs to be merged and new versions to be tagged for each package:

Then we can just bump up the compat section for MLJModelInterface.jl

aviatesk commented 4 years ago

I will bump the MLJModelInterface.jl compat once the new version of MLJBase is released. Then the compats would be all good and we can check how CI goes. (maybe after that I will add a test on this)

/cc @ablaom

ablaom commented 4 years ago

@aviatesk I think you can update the [compat] now and retrigger CI

aviatesk commented 4 years ago

nice, thanks for notifying the new release.

aviatesk commented 4 years ago

okay, so now fittted_params returns:

binary class classification

julia> fitted_params(mach)
(classes = CategoricalString{UInt32}["B", "O"],
coefs = [:FL => 2.6201486950992203, :RW => 0.39292726376244974, :CL => 0.42944513973557696, :CW => -2.364090952736262, :BD => 2.184956788698077],
intercept = -3.805377536908276,)

multi class classification:

julia> fitted_params(mach)
(classes = CategoricalString{UInt32}["setosa", "versicolor", "virginica"],
coefs = Pair{Symbol,SubArray{Float64,1,Array{Float64,2},Tuple{Int64,Base.Slice{Base.OneTo{Int64}}},true}}[:sepal_length => [0.2059480869774525, 0.15620597909667314, -0.3621540660741237], :sepal_width => [1.329471903695349, -0.5634731456659896, -0.765998758029362], :petal_length => [-2.4826936165327167, -0.1984673279849103, 2.6811609445176328], :petal_width => [-1.1274927833807755, -0.9013966540870679, 2.0288894374678437]],
intercept = [0.22683530315174644, -0.2268353017119584, -15.374523249617724],)

regression:

julia> fitted_params(mach)
(coefs = [:Crim => -0.09703973833125352, :Zn => 0.09842256842389137, :Indus => 0.05193156265302136, :NOx => 0.08164973034302989, :Rm => 1.9136770861182466, :Age => 0.08012259423475827, :Dis => -0.1673060345082267, :Rad => 0.2033714467207396, :Tax => -0.012765891530541989, :PTRatio => 0.5304063436035396, :Black => 0.0199836811158726, :LStat => -0.7721969054676527],
intercept = 0.17859876246704648,)

where coefs now returns vector of pairs of feature name and coef when given train data contains schema information, otherwise returns just coefs as before.

It now also returns classes field for classification models, which enables users to see which coef corresponds to which class.


Possible discussion would be that we may want to keep coefs fields as just vector of coefs and keep those feature names as a separate field (say, features), as classes are so. That way, fitted_params(some_model) would be always Vector{Float64} whenever given train data contains schema information, and would be more consistent, but then users have to match feature name to each coef manually, e.g. for plotting. I'm not sure which way we want to go, and so very open to hear your ideas.

aviatesk commented 4 years ago

Possible discussion would be that we may want to keep coefs fields as just vector of coefs and keep those feature names as a separate field (say, features), as classes are so.

what do you think on this, @tlienart ?

tlienart commented 4 years ago

sorry this is fine but fails on nightly due to some weird unrelated issue; could I ask you to allow failure on nightly in the travis yml file? thanks!

aviatesk commented 4 years ago

okay then let's go with the current implementation.

could I ask you to allow failure on nightly in the travis yml file?

done πŸ‘

codecov-io commented 4 years ago

Codecov Report

Merging #61 into dev will decrease coverage by 1.74%. The diff coverage is 36.84%.

Impacted file tree graph

@@            Coverage Diff             @@
##              dev      #61      +/-   ##
==========================================
- Coverage   94.78%   93.03%   -1.75%     
==========================================
  Files          22       22              
  Lines         805      790      -15     
==========================================
- Hits          763      735      -28     
- Misses         42       55      +13     
Impacted Files Coverage Ξ”
src/glr/constructors.jl 100.00% <ΓΈ> (ΓΈ)
src/mlj/interface.jl 72.00% <36.84%> (-18.70%) :arrow_down:
src/loss-penalty/utils.jl 71.42% <0.00%> (-14.29%) :arrow_down:
src/loss-penalty/generic.jl 86.07% <0.00%> (-1.27%) :arrow_down:
src/fit/proxgrad.jl 94.11% <0.00%> (-0.89%) :arrow_down:
src/glr/d_robust.jl 91.05% <0.00%> (-0.22%) :arrow_down:
src/utils.jl 97.33% <0.00%> (-0.04%) :arrow_down:
src/glr/d_l2loss.jl 100.00% <0.00%> (ΓΈ)
src/glr/d_logistic.jl 100.00% <0.00%> (ΓΈ)
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more Ξ” = absolute <relative> (impact), ΓΈ = not affected, ? = missing data Powered by Codecov. Last update 100162a...1c05e13. Read the comment docs.

tlienart commented 4 years ago

Thanks