Closed lorentzenchr closed 5 years ago
Thanks for the bug report, and for finding the right tab ;)
I modified your code so that it didn't fail on an import. I confirm we have a problem. It seems the issue is with the stopping criteria. @pavanramkumar do you have time to look since you implemented this last? Let me try to help you:
import numpy as np
from sklearn.datasets.samples_generator import make_regression
from sklearn.linear_model import ElasticNet
from pyglmnet import GLM
from pyglmnet import _loss
import matplotlib.pyplot as plt
def rmse(a, b):
return np.sqrt(np.mean((a - b) ** 2))
X, Y, coef_ = make_regression(
n_samples=1000, n_features=1000,
noise=0.1, n_informative=10, coef=True,
random_state=42, tail_strength=0)
loss_trace = list()
alpha = 0.1
l1_ratio = 0.5
def callback(beta):
Tau = None
eta = 2.0
group = None
loss_trace.append(
_loss('gaussian', l1_ratio, Tau, alpha,
X, Y, eta, group, beta))
sk = ElasticNet(alpha=alpha, l1_ratio=l1_ratio,
tol=1e-5).fit(X, Y)
pg = GLM(distr='gaussian', alpha=l1_ratio, reg_lambda=alpha,
solver='cdfast', tol=1e-5, learning_rate=1e-2,
callback=callback, max_iter=1000).fit(X, Y)
y_sk = sk.predict(X)
y_pg = pg.predict(X)
plt.plot(loss_trace)
plt.show()
print('in-sample rmse sklearn = {}, pyglmnet = {}'.format(
rmse(Y, y_sk), rmse(Y, y_pg)))
If you set max_iter=7
you get pretty much what sklearn
gives.
@pavanramkumar did you get a chance to get back to this?
Within GLM.fit()
it looks like the z
cache gets out-of-sync with beta
, which violates assumptions self._cdfast()
makes about z
.
Recalculating z = beta[0] + np.dot(X, beta[1:])
immediately before the call to self._cdfast()
seems to prevent divergence. I suspect that it's because beta[1:]
is later modified by self._prox()
but the z
cache isn't updated to reflect those changes.
hmm ... but that's weird that it's not caught by any of the tests. We already have tests which check that the loss goes down. Could you make a PR with an updated test? Thanks.
I'm wrapping up a branch/PR with an updated test that makes the loss-checking kick in by using nonzero reg_lambda
, and I've confirmed that skipping the caching of z
resolves the divergence. I'll push everything up shortly.
I tried to compare pyglmnet.GLM with ElasticNet from scikit-learn and could not get it work. Code to reproduce:
Result: in-sample rmse sklearn = 12.756054997216442, pyglmnet = 161.03496460055504