JuliaStats / GLMNet.jl

Julia wrapper for fitting Lasso/ElasticNet GLM models using glmnet
Other
95 stars 35 forks source link

`cv.meanloss` differs from `cv$cvm` in R #70

Open gfrt0 opened 1 year ago

gfrt0 commented 1 year ago

In trying to get the cross-validation output from glmnet in R and GLMNet.jl to conform, I find the losses differ even when everything else (lambda sequence, fold id) is the same across the two. This yields an argmin(cv.meanloss) different from which.min(cv$cvm) (R), and it sometimes matters. What is the source of the difference?

Example:

R

require(glmnet)

data <- iris
foldid <- rep(1:10, nrow(data) / 10)
x <- model.matrix(data = data, ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width)

cvl <- cv.glmnet(y = data$Species, x = x, 
                 family = "multinomial", alignment = "fraction", foldid = foldid)

round(cvl$lambda, 8)

cvl$cvm
2.1972246 2.0531159 1.9324868 ...

julia

using Pkg

Pkg.add("RDatasets")
Pkg.add("GLMNet")
Pkg.add("GLM")

using RDatasets, GLMNet, GLM

iris = dataset("datasets", "iris")

fml = @formula(Species ~ SepalLength + SepalWidth + PetalLength + PetalWidth + SepalLength)
x = ModelMatrix(ModelFrame(fml, iris)).m
foldid = repeat(1:10, Int(size(iris, 1) / 10))
cvl = glmnetcv( x, iris.Species; folds = foldid )

cvl.lambda'

cvl.meanloss
2.1955639962247964
2.0530748153377423
1.9324668652650965
...