AnchorBlues / GroupLasso

Group Lasso package for Python.
BSD 2-Clause "Simplified" License
15 stars 8 forks source link

diverging results #1

Closed duemig closed 5 years ago

duemig commented 5 years ago

Hey,

using your code gives me different results compared to the following two implementations (which both give me the same result)

https://github.com/rtavenar/SparseGroupLasso

https://gist.github.com/fabianp/1423373

I set the parameters of grouplasso in such a way, that I get simple plain groupLasso.

However, I am not able to get the same results.

Any idea ?

Best,

David

AnchorBlues commented 5 years ago

@dduemig

if you set max_iter=1000000 at GroupLassoRegressor, the calculation will be converged and you will get the same result.

from sklearn import datasets
from grouplasso import GroupLassoRegressor

diabetes = datasets.load_diabetes()
X = diabetes.data
y = diabetes.target

group_ids = np.r_[[0, 0], np.arange(X.shape[1] - 2)]
model = GroupLassoRegressor(group_ids=group_ids, alpha=0.1, max_iter=1000000)
model.fit(X, y)
for coef_i, coef in enumerate(model.coef_):
    print("coef{}:{}".format(coef_i, coef))

# coef0:3.0803703107855847
# coef1:-191.56360870430598
# coef2:515.5642971813545
# coef3:281.54507262548685
# coef4:-49.95874746864712
# coef5:-0.0
# coef6:-225.12685719068176
# coef7:0.0
# coef8:477.9147544566873
# coef9:36.22867657644983

If you want the calculation to be converged quickly, you should normalize X using scaler (such as StandardScaler) before model.fit because the scale of X is very small.

Thanks.

duemig commented 5 years ago

Thank you for your feedback

Its getting close

import numpy as np
from sklearn import datasets
from grouplasso import GroupLassoRegressor
diabetes = datasets.load_diabetes()
X = diabetes.data
y = diabetes.target
group_ids = np.r_[[0, 0], np.arange(X.shape[1] - 2)]
model = GroupLassoRegressor(group_ids=group_ids, alpha=0.1, max_iter=100000000, verbose=False)
model.fit(X, y)
for coef_i, coef in enumerate(model.coef_):
    print("coef{}:{}".format(coef_i, coef)) 
from group_lasso import *
sparse_ids = [1, 1,1,1,1,1,1,1,1,1]
Sparse_lasso = SGL(groups=group_ids, alpha=0, lbda=0.1 , ind_sparse=sparse_ids, max_iter_outer=100000000, max_iter_inner=100000000)
Sparse_lasso.fit(X, y)
Sparse_lasso.coef_

where from group_lasso import * imports the following implementation: https://github.com/rtavenar/SparseGroupLasso

However, it seems a bit slow for my purposes (applying over 200 10-y rolling regressions on splines (6 features * 6 basis functions) using GridsearchCV to find the optimal alpha)

Best, David

duemig commented 5 years ago

image

This is how it looks like for one of these 10-y rolling regressions

where the data looks like image