Open marcdelabarrera opened 1 week ago
M = CUSPARSE.CuSparseMatrixCSC(CuVector(M.colptr), CuVector(M.rowval), CuVector(M.nzval), (N, N))
You can simply use cu
:
julia> cu(M)
100×100 CuSparseMatrixCSC{Float32, Int32} with 1965 stored entries:
About the 'issue' that CuSparseMatrixCSC
CuSparseVector
results in a CuArray
: which CUSPARSE API would you want this to use instead? AFAIK, CUSPARSE doesn't have much sparse sparse functionality. Right now, the sparseness of the vector gets dropped by CUDA.jl in order to be able to use cusparseSpMV
: https://github.com/JuliaGPU/CUDA.jl/blob/d72cdaa324cfadc1e67a7aa7fb9c4b035d2ec07c/lib/cusparse/interfaces.jl#L76-L79
Thank you for the tip on getting M to a CuSparseMatrixCSC
the way it has to be (I'm new with CUDA)
For context, I'm trying to replicate the code that solves a Linear Complementarity Problem with sparse matrix on GPU. (Codes for matlab and non GPU julia.)
CUDA has support for sparse matrix sparse matrix, cusparseSpGEMM
this is why I was surprised that there is no support for sparse matrix sparse vector. I can convert the vector q into an Nx1 matrix and get an Nx1 sparse matrix. Or I can do the multiplication, get a CuArray, and then transform it into a CuSparseVector. Both things seem inefficient.
CUDA has support for sparse matrix sparse matrix,
cusparseSpGEMM
this is why I was surprised that there is no support for sparse matrix sparse vector.
Ah, right, I guess the generic_matvecmul!
method above could then rather reinterpret the input sparse vector as a 1-column CSC sparse matrix and use cusparseSpGEMM
, is what you're suggesting? Although that should preserve the sparsity, the performance with such kind of inputs may be bad.
Quickly testing this out using:
function CuSparseMatrixCSC(vec::CuSparseVector{Tv,Ti}) where {Tv,Ti}
colPtr = CuVector{Ti}([1, vec.nnz + 1])
CuSparseMatrixCSC{Tv,Ti}(colPtr, vec.iPtr, vec.nzVal, (vec.len, 1))
end
q2 = CuSparseMatrixCSC(q)
# assuming n=N, which is missing from your MWE
julia> @benchmark CUDA.@sync M*q
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 33.730 μs … 21.865 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 35.319 μs ┊ GC (median): 0.00%
Time (mean ± σ): 37.661 μs ± 218.323 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▁▂▅▅█▇█▆▆▃▄▃▂▃
▂▂▂▂▂▂▃▃▄▅▆████████████████▇▇▆▄▄▄▃▃▄▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▁▂▂▂▂▂ ▄
33.7 μs Histogram: frequency by time 38.8 μs <
Memory estimate: 3.39 KiB, allocs estimate: 139.
julia> @benchmark CUDA.@sync M*q2
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 141.289 μs … 705.674 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 145.888 μs ┊ GC (median): 0.00%
Time (mean ± σ): 146.093 μs ± 6.037 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▁▂▄█▂▁
▂▁▁▂▂▂▂▂▂▂▂▃▃▄▄▄▅▅▆▇▇████████▆▆▆▄▅▄▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂ ▃
141 μs Histogram: frequency by time 152 μs <
Memory estimate: 3.41 KiB, allocs estimate: 137.
So doing the sparse multiplication is way slower.
Also, isn't a dense output kind-of reasonable here, as a sparse matrix * sparse vector is likely to yield a vector with few zeros? At least your example does. I'm not too familiar with the domain, but if your sparsity pattern is specific, maybe you can create a PR "productionizing" the conversion above such that you can use it in order to dispatch to cusparseSpGEMM
, without changing the current behavior.
Thank you a lot. I guess you are right. My matrix is very sparse (way more than 80%), but my vector is not that sparse, so the resulting vector has almost all entries being nonzero anyway.
Alright, I think we can narrow the scope to making this conversion possible then.
The output of sparse matrix * sparse vector is a dense vector in many cases. We can add a dispatch for it but it will be slow except if the user knows that it can be applied efficiently on his problem.
We can reuse this trick: https://github.com/JuliaGPU/CUDA.jl/blob/master/lib/cusparse/extra.jl#L84
I implemented it such that we can add two sparse vectors. NVIDIA only provides a routine to add two sparse matrices.
We can reuse this trick: https://github.com/JuliaGPU/CUDA.jl/blob/master/lib/cusparse/extra.jl#L84
That's the same as https://github.com/JuliaGPU/CUDA.jl/issues/2484#issuecomment-2331822558, right? It should use a single column, Julia being column major.
We can reuse this trick: https://github.com/JuliaGPU/CUDA.jl/blob/master/lib/cusparse/extra.jl#L84
That's the same as #2484 (comment), right? It should use a single column, Julia being column major.
Yes, but you add an additional layer of transformation if you use CSC matrices.
Julia is column major but many CUSPARSE routines are just implemented for sparse CSR matrices. Sparse matrix matrix is one example of CUDA routine only implemented for CSR CSR. If something is working with CSC matrices (like sparse triangular solves or sparse CSC * CSC), it's because we use the relation between A and A' for CSC / CSR formats. https://github.com/JuliaGPU/CUDA.jl/blob/master/lib/cusparse/generic.jl#L567
Oh right I didn't see the CSR instead of CSC. I guess we could just provide both constructors, as this is going to be a manually-invoked conversion anyway.
Yes, I can open a PR later this week for that.
Multiplying a
CuSparseMatrixCSC
and aCuSparseVector
vector returns aCuArray
, instead of aCuSparseVector
.Will get an error
broadcast with sparse arrays is currently only implemented for CSR and CSC matrices
Is there a way to handle these operations? An obvious one is to convert the
CuArray
to aCuSparseVector
again and proceed with the code. I am not sure if this is what we are expected to do.