JuliaStats / GLMNet.jl

Julia wrapper for fitting Lasso/ElasticNet GLM models using glmnet
Other
96 stars 35 forks source link

get coef(cvfit, s = "lambda.min") #31

Closed aw2018 closed 5 years ago

aw2018 commented 5 years ago

how to simply get coef(cvfit, s = "lambda.min") as R does

AsafManela commented 5 years ago

I think the example in the README.md exactly that:

julia> cv = glmnetcv(X, y)
Least Squares GLMNet Cross Validation
55 models for 4 predictors in 10 folds
Best λ 0.343 (mean loss 76.946, std 12.546)

julia> argmin(cv.meanloss)
48

julia> cv.path.betas[:, 48]
4-element Array{Float64,1}:
 0.926911
 0.00366805
 0.0
 0.0

If not, perhaps look at other fields of the returned cv

aw2018 commented 5 years ago

Thanks for response, but the question is which field should be? to get the coef automatically with lambda.min as R package does. R code shown in https://web.stanford.edu/~hastie/glmnet/glmnet_alpha.html

I think the example in the README.md exactly that:

julia> cv = glmnetcv(X, y)
Least Squares GLMNet Cross Validation
55 models for 4 predictors in 10 folds
Best λ 0.343 (mean loss 76.946, std 12.546)

julia> argmin(cv.meanloss)
48

julia> cv.path.betas[:, 48]
4-element Array{Float64,1}:
 0.926911
 0.00366805
 0.0
 0.0

If not, perhaps look at other fields of the returned cv

AsafManela commented 5 years ago

I would guess its cv.meanloss but you could try running the same example data in both R and GLMNet.jl and compare

aw2018 commented 5 years ago

cv.meanloss should be a series of mean loss for a certain λ as shown below. what I need a coef matrix to select variables for a small s.

cv = glmnetcv(X, y) Least Squares GLMNet Cross Validation 55 models for 4 predictors in 10 folds Best λ 0.241 (mean loss 112.897, std 16.404)

julia> cv.meanloss 55-element Array{Float64,1}: 877.2810878855138 761.7422582965088 650.8555199966314 558.8059633274238 482.39442752396894 418.96499646821854 366.3127578869651 322.60722407531716 286.3287550353564 256.2157763950583 231.22095929491542 210.47484061338398 193.33085107722835 ⋮
112.96750068583216 112.94587149513234 112.92697317790956 112.91614261629186 112.90772816957528 112.90267191956609 112.90089098278163 112.89809237633227 112.89745539845951 112.8982327601517 112.89995121762695 112.90229334644275

AsafManela commented 5 years ago

That is what the other two lines I quoted earlier do. argmin(cv.meanloss) gives the index of the path segment with the lowest mse (in the example 48). cv.path.betas[:, argmin(cv.meanloss)] gives the coefficient vector at that segment.

aw2018 commented 5 years ago

Thanks