Closed whoisnnamdi closed 4 years ago
The differences are really tiny if you reduce tol
and increase max_iter
. See below:
from pyglmnet import GLMCV
from pyglmnet import GLM
from pyglmnet.datasets import fetch_group_lasso_datasets
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
df, group_idxs = fetch_group_lasso_datasets()
X = df[df.columns.difference(["Label"])].values
y = df.loc[:, "Label"].values
Xtrain, Xtest, ytrain, ytest = \
train_test_split(X, y, test_size=0.2, random_state=42)
# Setup lasso cv model
gl_glm = GLMCV(distr="binomial", tol=1e-7,
score_metric="pseudo_R2",
alpha=1.0, learning_rate=1, max_iter=200, cv=3, verbose=True)
# Fit the model
gl_glm.fit(Xtrain, ytrain)
# Save the optimal lambda based on highest score
opt_lambda = gl_glm.reg_lambda[gl_glm.scores_.index(max(gl_glm.scores_))]
print(opt_lambda) # 0.010000000000000007
# Setup lasso model using optimal lambda found earlier, all other relevant parameters kept the same
glm = GLM(distr="binomial", tol=1e-7, reg_lambda=opt_lambda,
score_metric="pseudo_R2",
alpha=1.0, learning_rate=1., max_iter=200, verbose=True)
# Fit the model
glm.fit(Xtrain, ytrain)
# Compare beta coefficients
print(gl_glm.beta_ - glm.beta_)
plt.plot(gl_glm.beta_)
plt.plot(glm.beta_, 'r')
plt.show()
I see this:
This is specially the case for the second model which is not helped by warm start.
Thanks, I noticed this too - that changing the the tol
and max_iter
parameters could get the coefficient values closer.
That said, there are a few instances in your example where GLMCV
assigns a non-zero value while GLM
does not, or vice versa, which matters if one is using this for variable selection. Is this effectively unavoidable?
Well, you need to push the convergence even further. You'll see that there is this warning currently:
/Users/mainak/Documents/github_repos/pyglmnet/pyglmnet/pyglmnet.py:900: UserWarning: Reached max number of iterations without convergence.
"Reached max number of iterations without convergence.")
Use max_iter=500
and you see there is almost no difference between the two.
I agree it's a bit hard to debug this. Wouldn't be opposed to adding a method plot_convergence
to the GLM
object if that helps you.
Thanks for the help. Yes a plot_convergence
method would be super helpful if it's not too difficult to implement!
Problem Optimal penalization parameter (lambda) found via GLMCV does not yield similar results when plugged into GLM with otherwise similar parameters.
Example Script below mostly follow the group lasso example code in the docs except modified for regular lasso instead of group.
Results You can tweak the learning rate and iterations of the second model, but the results will never match those of GLMCV even with many iterations and a low learning rate.
I understand there may be some inherent instability given the way that convergence is reached, but this feels like too much. Especially since an important purpose of lasso is to do feature selection, if certain variables have non-zero coefficients in one model but not the other, this somewhat defeats that purpose.
Thank you for looking into this.