dwf / glmnet-python

Wrappers of Jerome Friedman's coordinate-descent Fortran implementation of lasso/elastic net regression from the R "glmnet" package.
GNU General Public License v2.0
35 stars 25 forks source link

Segfault when training elastic net / lasso with wide problems #1

Open ogrisel opened 14 years ago

ogrisel commented 14 years ago

When the number of features is much bigger than the number of samples I get a segmentation fault. The following script can reproduce the problem:

import numpy as np
from glmnet.elastic_net import Lasso

# problem dim
n_samples = 100
n_features = 100000
n_informative_features = 10

# normally distributed input signal
X = np.random.randn(n_samples, n_features)

# generate a ground truth model with only the first 10 features being non
# zeros (the other features are not correlated to Y and should be ignored by
# the L1 regularizer)
true_coef = np.zeros(n_features)
true_coef[:n_informative_features] = np.random.randn(n_informative_features)

# generate the ground truth Y from the reference model and X + label noise
Y = np.dot(X, true_coef) + np.random.normal(scale=0.1, size=n_samples)

print Lasso(alpha=1).fit(X, Y)
dwf commented 14 years ago

Can you try with the R glmnet package? It might just be the Fortran code...

ogrisel commented 14 years ago

I am not familiar with R but I will give it a shot as soon as I receive the R in a Nutshell book from amazon.