benchopt / benchmark_ridge

Benchopt benchmark for Ridge regression
0 stars 5 forks source link

Ridge with GLMNET: discrepancies #2

Open tanglef opened 2 years ago

tanglef commented 2 years ago

glmnet ridge does not seem to always give the same results as others on simple datasets (even outside of Benchopt, that's why I present the R code): To solve the ridge problem: $\min \frac{1}{2}||y-X\beta||_2^2 + \frac{\lambda}{2}||\beta||_2^2$ the penalty in glmnet is: $\lambda_R=\frac{\sigma(y)}{n}\lambda$ with $\sigma(y)$ the french standard deviation

set.seed(123)
n    <- 1000
p   <-  100
X   <- matrix(rnorm(n*p,0,1),n,p)
beta <- rnorm(p,0,1)
Y    <- X%*%beta+rnorm(n,0,0.5)
sd_y <- sqrt(var(Y)*(n-1)/n)[1,1]
beta1 <- solve(t(X)%*%X+10*diag(p),t(X)%*%(Y))[,1]
fit_glmnet <- glmnet(X,Y, alpha=0.001, standardize = F, intercept = FALSE, thresh = 1e-20, lambda=sd_y*10/n)
beta2 <- as.vector(coef(fit_glmnet))[-1]
cbind(beta1[1:10], beta2[1:10])  # same beta

But on datasets like bodyfat:

X <- bodyfat[, 4:17]
X <- sapply(X, as.numeric)
Y <- bodyfat$siri
p <- dim(X)[2]
n <- dim(X)[1]
sd_y <- sqrt(var(Y)*(n-1)/n)
beta1 <- solve(t(X)%*%X+10*diag(p),t(X)%*%(Y))
fit_glmnet <- glmnet(X,Y, alpha=0.001, standardize = F, lambda=sd_y*10/n,
                     intercept = F, thresh = 1e-20, maxit=10000000)
beta2 <- as.vector(coef(fit_glmnet))[-1]
cbind(beta1[1:10], beta2[1:10])  # not exactly same beta

@cassiofragadantas found this post that also says there are discrepancies not explained by rescaling So how would we treat this in Benchopt ? (a case where the solver does not converge to the same coefficients as others sometimes)

mathurinm commented 1 year ago

Related test failure: https://github.com/benchopt/benchmark_ridge/actions/runs/3572133863/jobs/6004677349#step:6:430

I'd skip glmnet in the tests and add a warning docstring in the glmnet solver. @Badr-MOUFAD do you have time to take a look at it?

Badr-MOUFAD commented 1 year ago

It was among the reasons why the CI test failed in https://github.com/benchopt/benchmark_ridge/pull/4

As far as I know, glmnet uses a normalized datafit. Yet, I tried adding/removing the scaling by n_samples but didn't seem to solve the problem.