QB3 / sparse-ho

Fast hyperparameter settings for non-smooth estimators:
http://qb3.github.io/sparse-ho
BSD 3-Clause "New" or "Revised" License
38 stars 15 forks source link

RCV1_train bug #122

Open ksehic opened 3 years ago

ksehic commented 3 years ago

Hi @QB3

I was running Sparse-HO for rcv1_train with gradient descent and found a very strange bug. The first iteration would be OK, but then gradients would blow up and result in the bug "ValueError: 0 or negative weights are not supported." It is alpha index 71 that generates the bug...

Iteration 1/5 ||Value outer criterion: 2.16e-01 ||norm grad 1.30e-01 Iteration 2/5 ||Value outer criterion: 2.13e-01 ||norm grad 1.42e+37 Iteration 3/5 ||Value outer criterion: 2.13e-01 ||norm grad 7.62e+37 sparse-ho/sparse_ho/algo/implicit_forward.py:151: RuntimeWarning:invalid value encountered in double_scalars Iteration 4/5 ||Value outer criterion: 2.13e-01 ||norm grad 9.90e+77 sparse-ho/sparse_ho/algo/forward.py:132: RuntimeWarning:overflow encountered in exp

ValueError: 0 or negative weights are not supported.

ValueError is related to celer_path function.

This is the code that you can use to rerun

n_splits = 5
tol = 1e-4
seed = 42
n_repeat = 5

X, y = fetch_libsvm('rcv1_train')

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.15, random_state=seed)

kf = KFold(shuffle=True, n_splits=n_splits, random_state=seed)

alpha_max = np.max(np.abs(X_train.T @ y_train)) / len(y_train)
alpha_min = alpha_max / 1e3
alpha_default = (alpha_max / 10) * np.ones((X_train.shape[1],))

n_alphas = 100
alphas = np.geomspace(alpha_max, alpha_min, n_alphas)

alpha_index = 71
alpha_weights = alphas[alpha_index] * np.ones(X_train.shape[1])

estimator = Lasso(fit_intercept=False, max_iter=100, warm_start=True)
model = WeightedLasso(estimator=estimator)
sub_criterion = HeldOutMSE(None, None)
criterion = CrossVal(sub_criterion, cv=kf)
algo = ImplicitForward()
monitor = Monitor()
optimizer = GradientDescent(n_outer=5, tol=tol,
                            verbose=True, p_grad_norm=1.9)
grad_search(
    algo, criterion, model, optimizer, X_train, y_train,
    alpha_weights, monitor)

estimator.weights = monitor.alphas[-1]
estimator.fit(X_train, y_train)

mspe = mean_squared_error(estimator.predict(X_test), y_test)
ksehic commented 3 years ago

I was playing with p_grad_norm and for a very small p_grad_norm like 0.01, we do not have this bug. However, gradients still increase instead to converge to zero

Dataset: rcv1_train Iteration 1/5 ||Value outer criterion: 2.16e-01 ||norm grad 1.30e-01 Iteration 2/5 ||Value outer criterion: 2.16e-01 ||norm grad 1.42e-01 Iteration 3/5 ||Value outer criterion: 2.16e-01 ||norm grad 2.25e-01 Iteration 4/5 ||Value outer criterion: 2.16e-01 ||norm grad 3.93e-01 Iteration 5/5 ||Value outer criterion: 2.16e-01 ||norm grad 6.60e-01

I tried the same case with your other grad optimizers like ADAM and line search.

optimizer = Adam(n_outer=5, lr=0.11, verbose=True, tol=tol

We found the same strange behavior with ADAM, gradients increase with each iteration.

Dataset: rcv1_train Iteration 1/5 || Value outer criterion: 2.16e-01 || norm grad 1.30e-01 Iteration 2/5 || Value outer criterion: 2.12e-01 || norm grad 1.87e-01 Iteration 3/5 || Value outer criterion: 2.08e-01 || norm grad 3.62e+28 Iteration 4/5 || Value outer criterion: 2.08e-01 || norm grad 1.72e+56

optimizer = LineSearch(n_outer=5, verbose=True, tol=tol)

Line search seems to be stable but has strange jumps from one iteration to another one.

Dataset: rcv1_train Iteration 1/5 || Value outer criterion: 2.16e-01 || norm grad 1.30e-01 Iteration 2/5 || Value outer criterion: 2.18e-01 || norm grad 1.22e-01 Iteration 3/5 || Value outer criterion: 2.17e-01 || norm grad 1.25e-01 Iteration 4/5 || Value outer criterion: 2.18e-01 || norm grad 1.36e-01 Iteration 5/5 || Value outer criterion: 2.22e-01 || norm grad 1.44e-01