SteffenMoritz / ridge

CRAN R Package: Ridge Regression with automatic selection of the penalty parameter
GNU General Public License v2.0
18 stars 11 forks source link

ridge:::vcov.ridgeLinear computes wrong ridge standard errors #15

Open BenjaminPlanterose opened 2 years ago

BenjaminPlanterose commented 2 years ago

Dear Steffen,

I believe there are some errors in the expression employed to compute the standard errors of the ridge linear coefficients (ridge:::vcov.ridgeLinear). Specifically, it always penalizes the intercept (even when argument scaling = "none" on the ridge::linearRidge call) and it uses an incomplete formula. The proper formula for the Ridge standard errors is:

Where Gamma is the unitary matrix of dimensions (m+1)×(m+1) but where element i=1, j=1, is set to zero to avoid penalizing the intercept (only in the case of scaling = "none"). And it is not as programmed in ridge:::vcov.ridgeLinear:

You may find references for this formula below: https://www.statlect.com/fundamentals-of-statistics/ridge-regression https://ncss-wpengine.netdna-ssl.com/wp-content/themes/ncss/pdf/Procedures/NCSS/Ridge_Regression.pdf https://stats.stackexchange.com/questions/2121/how-can-i-estimate-coefficient-standard-errors-when-using-ridge-regression

See also below a script reproducing what I believe the proper Ridge standard errors are:

# Load libraries
library(MASS)
library(ridge)

# Generate X, y
set.seed(1)
n = 100
m = 10
mu = rnorm(m) # Generate mean vector
Sigma = rWishart(n = 1, df = m, Sigma = diag(m))[,,1] # Generate Covariance matrix
X = mvrnorm(n = n, mu = mu, Sigma = Sigma)
X_ = cbind(1, X) # X with an intercept column
THETA = rnorm(m+1)
y = X_ %*% THETA + rnorm(n, sd = 3)

# Using unscaled ridge Regression
lambda = 100
ridge_model = linearRidge(y ~ X, lambda = lambda, scaling = "none")
I = diag(m+1); I[1,1] = 0

# Linear coefficients
solve(t(X_) %*% X_ + lambda*I) %*% t(X_) %*% y
(theta_ridge = coef(ridge_model)) # Ridge coefficients coincides

# Variance of residuals
(var_epsilon = sum((y - X_ %*% theta_ridge)^2)/(n-(m+1)))

# Standard errors
var_epsilon * solve(t(X_) %*% X_ + lambda*I) %*% t(X_) %*% X_ %*% solve(t(X_) %*% X_ + lambda*I)

# However, ridge:::vcov.ridgeLinear computes:
ridge:::vcov.ridgeLinear(ridge_model)
var_epsilon * solve(t(X_) %*% X_ + lambda*diag(m+1)) # WRONG FORMULA

I am aware that due to the lack of scale invariance of the ridge estimator, that center-scaling is recommended (and hence dropping the intercept column). Also, the degrees of freedom in ridge regression are not really n-(m+1) (some authors speak of effective degrees of freedom as n minus the trace of the hat matrix).

FYI, I am using ridge_3.0 on R version 4.1.0 (2021-05-18), platform: x86_64-w64-mingw32/x64 (64-bit), running under: Windows >= 8 x64 (build 9200).

Please let me know if I have overseen anything. Have a nice day,

Best, Ben.

SteffenMoritz commented 2 years ago

Hello Benjamin,

thanks a lot for your detailed report! Extremely nice issue documentation.

I haven't been able to dig deeper yet, as I was quite busy lately. Just wanted to mention, that I have not forgotten this issue and it is on my to-do list :)