Closed shayandavoodii closed 8 months ago
In the meantime maybe try both and see which one fails due to a shape mismatch?
In the meantime maybe try both and see which one fails due to a shape mismatch?
Surely I tried. But the result is not aligned with my expectation:
julia> using Lasso
julia> x = rand(5, 1); y = rand(5);
julia> m = fit(GammaLassoPath, x, y);
julia> coef(m)
2×50 SparseArrays.SparseMatrixCSC{Float64, Int64} with 100 stored entries:
⎡⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⎤
⎣⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⎦
I expect a 5×50
or 50×5
sparse matrix in this case! This is why I asked for further elaboration on the size of X
and y
in the docs.
Classically for regression purposes, if you have n
samples of dimension d
each, X
is expected to be a matrix with n
rows and d
columns, while y
is expected to be a vector of length n
.
From what I understand, the coefficients of your lasso path here correspond to 50 different values of the regularization lambda
. Each one gives rise to 2 coefficients, one for the only feature in X
(cause d = 1
) and one for the intercept. Does that help?
From what I understand, the coefficients of your lasso path here correspond to 50 different values of the regularization lambda. Each one gives rise to 2 coefficients, one for the only feature in X (cause d = 1) and one for the intercept.
I think I got my answer. So, in my example, I should use x = rand(1, 5)
and y=[rand()]
, because I have one sample with five features. Then:
julia> m = fit(GammaLassoPath, x, y)
┌ Warning: One of the predicators (columns of X) is a constant, so it can not be standardized.
│ To include a constant predicator set standardize = false and intercept = false
So, I should follow the instructions:
julia> m = fit(GammaLassoPath, x, y, standardize=false, intercept=false);
julia> coef(m)
5×76 SparseArrays.SparseMatrixCSC{Float64, Int64} with 75 stored entries:
⎡⠠⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⎤
⎣⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎦
That is aligned with what I expect. Now, how can I choose one of the coefficient series in the returned sparse matrix?
julia> coef(m)[:, 1]
5-element SparseArrays.SparseVector{Float64, Int64} with 0 stored entries
julia> coef(m)[:, 2]
5-element SparseArrays.SparseVector{Float64, Int64} with 1 stored entry:
[2] = 0.0231384
I expect a vector of length 5
in each. However, it returns a scalar with weird indexing.
I think I got it:
julia> coef(m) |> Matrix
5×76 Matrix{Float64}:
0.0 0.0 0.0 0.0 … 0.0 0.0 0.0
0.0 0.0231384 0.0452252 0.0663081 0.492017 0.492793 0.493533
0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0
I expect a vector of length 5 in each. However, it returns a scalar with weird indexing.
This is not a scalar, it is a sparse vector with only one nonzero entry. The reason for this behavior is that Lasso parameters are meant to be sparse, aka have few nonzero entries
Thank you. It seems that I reached the answer to my question. Thank you for your help and elaboration.
In the Lasso.md, a method of
fit
is introduced as follows:Is it possible to provide an example in the documentation or mention the acceptable shape of
X
andy
? I.e.,X
should be in size of $n\times m$, andy
should be a vector of length $m$. I have a problem using the method since I don't know what is the acceptable size of these two arguments. I believe they should have something in common for example the length ofy
should be equal to thenrows(X)
orncols(X)
.P.S.: In my case study, I have a
X
of size $d\times w$, and ay
of length $d$. I don't know if I should passX
orX'
as the second argument. I expect to get a matrix of coefficients of size $n\times d$ or $d\times n$.