bbalasub1 / glmnet_python

GNU General Public License v3.0
199 stars 93 forks source link

Different `lambdau` output for sparse and dense matrices #25

Open corradio opened 6 years ago

corradio commented 6 years ago

I am having difficulties having consistent results as the set of lambda values selected from cvglmnet is not the same when using sparse and dense matrices:

lambdau using cvglmnet(x=X.copy(), y=y.copy(), family='gaussian', parallel=True, keep=True, standardize=False, alpha=0.999, thresh=1e-10, standardize_resp=False):

[0.20531829 0.18707838 0.17045885 0.15531576 0.14151794 0.12894587
 0.11749068 0.10705313 0.09754282 0.08887739 0.08098177 0.07378757
 0.06723248 0.06125974 0.05581759 0.05085891 0.04634074]

lambdau using cvglmnet(x=X.todense().copy(), y=y.copy(), family='gaussian', parallel=True, keep=True, standardize=False, alpha=0.999, thresh=1e-10, standardize_resp=False) (the only difference is that a dense matrix is used as input):

[0.2038106  0.18570463 0.16920714 0.15417525 0.14047874 0.127999
 0.11662792 0.10626702 0.09682655 0.08822474 0.0803871  0.07324573
 0.06673878 0.06080989 0.05540771 0.05048544 0.04600045 0.0419139
 0.03819039 0.03479766 0.03170633 0.02888963 0.02632315 0.02398468
 0.02185394 0.0199125  0.01814353 0.01653171 0.01506308 0.01372491
 0.01250563 0.01139466 0.01038239 0.00946005 0.00861965 0.0078539
 0.00715618 0.00652045 0.00594119 0.00541339 0.00493248 0.00449429
 0.00409503 0.00373124 0.00339977 0.00309774 0.00282255 0.0025718
 0.00234333 0.00213515 0.00194547 0.00177264 0.00161516 0.00147168
 0.00134094 0.00122181]

I've attached matrices (numpy format) to reproduce.

Xy.zip

Note: here's how to load matrices:

X = np.load('x.npy').tolist()
y = np.load('y.npy')
corradio commented 6 years ago

I've been able to trace it back to elnet.py where the fortran call in the sparse case returns lmu_r 18 instead of 56, which truncates the sequence of lambdau.

corradio commented 6 years ago

In case other stumble upon the same issue, I'm now using the sklearn path generation as an alternative:

lambdau_sk = sklearn.linear_model.coordinate_descent._alpha_grid(X, y)