One standard error search gives smaller alpha than minimum CV search

When running test to find the optimal alpha in lasso, I've found that one standard error selection gives smaller alpha than minimum CV selection. This should not be the case. One standard error rule: Minimum CV rule:

Expected Behavior

Current Behavior

Possible Solution

It turns out that implementation in model_selection.py is problematic: https://github.com/CederGroupHub/sparse-lm/blob/f7bedb3bd2ca672f13b3547552b6559429c94991/src/sparselm/model_selection.py#L189 Here we used:

            params_sum = np.sum(params, axis=0)
            one_std_dists = np.abs(metrics - m + sig)
            candidates = np.arange(len(metrics))[
                one_std_dists < (np.min(one_std_dists) + 0.1 * sig)
            ]
            best_index = candidates[np.argmax(params_sum[candidates])]

in order to find the best alpha. This implementation cannot guarantee that one-std-rule always yields larger alpha than optimum CV rule.

Steps to Reproduce

import matplotlib.pyplot as plt
import numpy as np
from sklearn.datasets import make_regression
from sklearn.linear_model import Lasso
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import KFold, train_test_split

from sparselm.model_selection import GridSearchCV

X, y, coef = make_regression(
    n_samples=200,
    n_features=100,
    n_informative=10,
    noise=40.0,
    bias=-15.0,
    coef=True,
    random_state=0,
)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=0
)

# create estimators
lasso = Lasso(fit_intercept=True)

# create cv search objects for each estimator
cv5 = KFold(n_splits=5, shuffle=True, random_state=0)
params = {"alpha": np.logspace(-1, 1, 10)}

lasso_cv_std = GridSearchCV(
    lasso, params, opt_selection_method="one_std_score", cv=cv5, n_jobs=-1
)
lasso_cv_opt = GridSearchCV(
    lasso, params, opt_selection_method="max_score", cv=cv5, n_jobs=-1
)

# fit models on training data
lasso_cv_std.fit(X_train, y_train)
lasso_cv_opt.fit(X_train, y_train)

std_cv_mean = -lasso_cv_std.cv_results_["mean_test_score"]
std_cv_std = lasso_cv_std.cv_results_["std_test_score"]
print("Best params:", lasso_cv_std.best_params_)
print("log best param:", np.log(lasso_cv_std.best_params_["alpha"]))
print("Best cv:", -lasso_cv_std.best_score_)

plt.plot(np.log(params["alpha"]), std_cv_mean, color="k")
plt.fill_between(np.log(params["alpha"]), std_cv_mean-std_cv_std, std_cv_mean+std_cv_std)
plt.scatter([np.log(lasso_cv_std.best_params_["alpha"])], [-lasso_cv_std.best_score_], color="r", s=100)

opt_cv_mean = -lasso_cv_opt.cv_results_["mean_test_score"]
opt_cv_std = lasso_cv_opt.cv_results_["std_test_score"]
print("Best params:", lasso_cv_opt.best_params_)
print("log best param:", np.log(lasso_cv_opt.best_params_["alpha"]))
print("Best cv:", -lasso_cv_opt.best_score_)

plt.plot(np.log(params["alpha"]), opt_cv_mean, color="k")
plt.fill_between(np.log(params["alpha"]), opt_cv_mean-opt_cv_std, opt_cv_mean+opt_cv_std)
plt.scatter([np.log(lasso_cv_opt.best_params_["alpha"])], [-lasso_cv_opt.best_score_], color="r", s=100)

CederGroupHub / sparse-lm

One standard error search gives smaller alpha than minimum CV search #96

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce

Context

Detailed Description

Possible Implementation