glm-tools / pyglmnet

Python implementation of elastic-net regularized generalized linear models
http://glm-tools.github.io/pyglmnet/
MIT License
279 stars 83 forks source link

ValueError: group should be (n_features,) #291

Closed duemig closed 4 years ago

duemig commented 5 years ago

image

I dont get this error

jasmainak commented 5 years ago

It doesn't happen for me. Can you provide a full script to reproduce instead of a screenshot. Here is what I tried:

import numpy as np
from pyglmnet import GLM

group_ids = np.random.random(36)
X_train_trans = np.random.random((42603, 36))
y_train = np.random.random(42603)

glm = GLM(distr="gaussian", group=group_ids, alpha=0.05, reg_lambda=0.2, max_iter=1000)
glm.fit(X=X_train_trans, y=y_train)
duemig commented 5 years ago

I found it

image

now it works.

It is due to the datatype (np.float32 vs np.float64)

Could you fix that?

Can I use sklearn GridsearchCV to determine the parameters??

Thanks

Best, David

jasmainak commented 5 years ago

can you modify my script to show me how can I make it fail? It works for me whether I use np.float32 or np.float64.

Yes, GridsearchCV used to work but I am not quite sure if it works on the latest version of sklearn.

duemig commented 5 years ago
import numpy as np
from pyglmnet import GLM

group_ids = np.float32(np.random.random(36))
X_train_trans = np.random.random((42603, 36))
y_train = np.random.random(42603)

glm = GLM(distr="gaussian", group=np.float32(group_ids), alpha=0.05, reg_lambda=0.2, max_iter=1000)
glm.fit(X=np.float32(X_train_trans), y=np.float32(y_train))

image

But with the sklearn GirdsearchCV as well ? so not GLMCV ?

Can I use the package as grouplasso for penalizing betas of a cubic spline representation

duemig commented 5 years ago

Is there already an open issue for the following

image

Or am I doing something wrong ?

If I install pyglmnet I get version 1.0.0 image

duemig commented 5 years ago

image image

Does not seem to work ;(

jasmainak commented 5 years ago

You need to use the development version for this. Unfortunately we have a release due for a long time. Can you try using the development version in the meanwhile?

jasmainak commented 5 years ago

But with the sklearn GirdsearchCV as well ? so not GLMCV ?

you can use both depending on your application.

Can I use the package as grouplasso for penalizing betas of a cubic spline representation

sorry I don't know exactly what you are trying to do. But yes, we do support group lasso.

duemig commented 5 years ago

Thank you for your answer.

I will try this tmr and let you know whether it works.

However, from the source code it seems that tscv (https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.TimeSeriesSplit.html) is not supported.

This would be super helpful for time series prediction tasks where k-fold etc. fail.

jasmainak commented 5 years ago

It would be nice for GLMCV to accept a cv object from sklearn but nothing stops you from using your own cv and using cross_val_score etc.

duemig commented 5 years ago

Hey,

Is there a reason why it becomes so slow when I use the Github version?

image

GridsearchCV seems to work image

But it is super slow ;(

Any suggestions ? For my purpose its infeasible.

jasmainak commented 5 years ago

Just to be sure it's not a problem with the convergence criteria, can you set the max_iter lower and check the timings?

pavanramkumar commented 4 years ago

seems like the slowness is arising from the same root cause (group lasso). duplicated by #267