Segfault when training elastic net / lasso with wide problems

ogrisel commented 14 years ago

When the number of features is much bigger than the number of samples I get a segmentation fault. The following script can reproduce the problem:

import numpy as np
from glmnet.elastic_net import Lasso

# problem dim
n_samples = 100
n_features = 100000
n_informative_features = 10

# normally distributed input signal
X = np.random.randn(n_samples, n_features)

# generate a ground truth model with only the first 10 features being non
# zeros (the other features are not correlated to Y and should be ignored by
# the L1 regularizer)
true_coef = np.zeros(n_features)
true_coef[:n_informative_features] = np.random.randn(n_informative_features)

# generate the ground truth Y from the reference model and X + label noise
Y = np.dot(X, true_coef) + np.random.normal(scale=0.1, size=n_samples)

print Lasso(alpha=1).fit(X, Y)

dwf commented 14 years ago

Can you try with the R glmnet package? It might just be the Fortran code...

ogrisel commented 14 years ago

I am not familiar with R but I will give it a shot as soon as I receive the R in a Nutshell book from amazon.

dwf / glmnet-python

Segfault when training elastic net / lasso with wide problems #1