getspams / spams-python

Python interface for SPAMS (SPArse Modeling Software)
https://thoth.inrialpes.fr/people/mairal/spams/
GNU General Public License v3.0
16 stars 5 forks source link

Issue with spams.lasso() function #34

Closed carversh closed 1 year ago

carversh commented 1 year ago

Hi,

I am running spams.lasso on two sets of numpy arrays. The function is working in one instance and not in the other, however I am not sure why.

Instance where spams.lasso works: Z_1 = spams.lasso(X, D=D, lambda1=params['lambda1'], verbose=True) Shapes of input matrices:

>>> X.shape
(15668, 24192)
>>> D.shape
(15668, 15)

Instance where spams.lasso doesn't work: Z_2 = spams.lasso(Z_1.T, D=ind_mat.T, lambda1=params['lambda1'], verbose=True) Shapes of input matrices:

>>> ind_mat.T.shape
(24192, 600)
>>> Z_1.T.shape
(24192, 15)

Error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/.../lib/python3.9/site-packages/spams/spams.py", line 456, in lasso
    (indptr, indices, data, shape) = spams_wrap.lassoD(X, D, return_reg_path, L,
  File "/.../lib/python3.9/site-packages/spams_wrap/spams_wrap.py", line 228, in lassoD
    return _spams_wrap.lassoD(*args)
NotImplementedError: Wrong number or type of arguments for overloaded function 'lassoD'.
  Possible C/C++ prototypes are:
    _lassoD< double >(Matrix< double > *,Matrix< double > *,Matrix< double > **,bool,int,double const,double const,constraint_type,bool const,bool const,int const,int,bool const,bool)
    _lassoD< float >(Matrix< float > *,Matrix< float > *,Matrix< float > **,bool,int,float const,float const,constraint_type,bool const,bool const,int const,int,bool const,bool)

I don't believe it's the wrong dimensionality of the arrays given the shapes that I posted. Help would be greatly appreciated.

carversh commented 1 year ago

Additionally, this regression works for me when using sklearn, however, I wanted to stick to using this package. Not sure if it's a bug or I am using a package version that is incompatible.

samuelstjean commented 1 year ago

Turns out I was wrong, it's much simpler: when you using the transpose operator, it does not copy the array, it only changes the way it's read in memory. And that makes it not fortran aligned anymore, hence it complains. Try instead

Z_2 = spams.lasso(np.asfortranarray(Z_1.T), D=np.asfortranarray(ind_mat.T), lambda1=0.1)

and you can looks at the flags to know if the data is in the right format like this

In [23]: np.asfortranarray(Z_1.T).flags
Out[23]: 
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False

In [18]: np.array(Z_1.T).flags
Out[18]: 
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False

Jut make sure your input is fortran aligned/contiguous, that's the error I get here: RuntimeError: matrix arg 1 must be a 2d double Fortran Array, hopefully that makes sense with the snippet you posted. I just saw you get a different message but that could be why your first call is passing and not the next one. A full example would probably help us tell where it went wrong.

carversh commented 1 year ago

Hi Samuel, thanks for answering! I actually did try this.

When I run the following code: Z_2 = spams.lasso(np.asfortranarray(Z_1.T), D=np.asfortranarray(ind_mat.T), lambda1=params['lambda1'], verbose=True)

I still get the following error: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/.../lib/python3.8/site-packages/spams/spams.py", line 456, in lasso (indptr, indices, data, shape) = spams_wrap.lassoD(X, D, return_reg_path, L, File "/.../lib/python3.8/site-packages/spams_wrap/spams_wrap.py", line 228, in lassoD return _spams_wrap.lassoD(*args) NotImplementedError: Wrong number or type of arguments for overloaded function 'lassoD'. Possible C/C++ prototypes are: _lassoD< double >(Matrix< double > *,Matrix< double > *,Matrix< double > **,bool,int,double const,double const,constraint_type,bool const,bool const,int const,int,bool const,bool) _lassoD< float >(Matrix< float > *,Matrix< float > *,Matrix< float > **,bool,int,float const,float const,constraint_type,bool const,bool const,int const,int,bool const,bool)

carversh commented 1 year ago

I see the problem. When I convert one of the arrays to a fortran array, the shape messes up?

>>> np.asfortranarray(ind_mat.T).shape
(24192, 600)
>>> np.asfortranarray(Z_1.T).shape
(1,)
carversh commented 1 year ago

Not sure how to prevent this collapsing of dimensions?

carversh commented 1 year ago

Here's a demonstration of what's happening that I can't really explain

>>> Z_1 = spams.lasso(np.asfortranarray(X), D=np.asfortranarray(D), lambda1=10, verbose=True)
>>> Z_1.shape
(10, 24192)
>>> Z_1_T = Z_1.T 
>>> Z_1_T.shape
(24192, 10)
>>> np.asfortranarray(Z_1_T).shape
(1,)
carversh commented 1 year ago

Not sure if this could also help:


>>> np.asfortranarray(Z_1).flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

>>> np.asfortranarray(Z_1.T).flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False
samuelstjean commented 1 year ago

Looks like your Z_1 array is a bit strange, since it somehow collapses on itself to just a number. And the flags seems to say it does not own it's data in any case, so it might be a view or some other object over an array, which the underlying C code here can not use or process (but somehow sklearn can, either because it stays in pure python or they copy it without telling people to avoid problems).

I'd suggest finding where/how your array is modified, since it looks like the problems happen before calling the function.

carversh commented 1 year ago

`>>> np.asfortranarray(Z_1.T) array([<24192x10 sparse matrix of type '<class 'numpy.float64'>' with 126443 stored elements in Compressed Sparse Row format>], dtype=object)

np.asfortranarray(Z_1.T).shape (1,)`

Not sure how to resolve this

carversh commented 1 year ago

Z_1 is an output from your function!

carversh commented 1 year ago

This is Z1!

Z_1 = spams.lasso(np.asfortranarray(X), D=np.asfortranarray(D), lambda1=10, verbose=True)

carversh commented 1 year ago

I'm not really sure how to look into an array that is outputted by your package

carversh commented 1 year ago

Even when I do this (and the dimensions are correct and in fortran numpy arrays, I still get the same issue):


>>> np.asfortranarray(Z_1.todense().T).shape
(24192, 10)
>>> np.asfortranarray(ind_mat.T).shape
(24192, 600)
>>> Z_2 = spams.lasso(np.asfortranarray(Z_1.todense().T), D=np.asfortranarray(ind_mat.T), lambda1=params['lambda1'], verbose=True) # can change lambda to be something else
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/.../lib/python3.8/site-packages/spams/spams.py", line 456, in lasso
    (indptr, indices, data, shape) = spams_wrap.lassoD(X, D, return_reg_path, L,
  File "/.../lib/python3.8/site-packages/spams_wrap/spams_wrap.py", line 228, in lassoD
    return _spams_wrap.lassoD(*args)
NotImplementedError: Wrong number or type of arguments for overloaded function 'lassoD'.
  Possible C/C++ prototypes are:
    _lassoD< double >(Matrix< double > *,Matrix< double > *,Matrix< double > **,bool,int,double const,double const,constraint_type,bool const,bool const,int const,int,bool const,bool)
    _lassoD< float >(Matrix< float > *,Matrix< float > *,Matrix< float > **,bool,int,float const,float const,constraint_type,bool const,bool const,int const,int,bool const,bool)
carversh commented 1 year ago

I got it to work... Z_2 = spams.lasso(np.asfortranarray(Z_1.todense().T), D=np.asfortranarray(ind_mat.T).astype(float), lambda1=params['lambda1'], verbose=True) # can change lambda to be something else