Open zhoeujei opened 2 months ago
The SpMM routine can do this. The API is phrased as dense = sparse * dense
, but you can achieve dense = dense * sparse
by transposing everything.
It is indeed possible to achieve this through transposition, but it reduces efficiency. Please consider adding direct support.
Thanks for the suggestion. Have you got any benchmarks or particular call sequences or matrices that seem unexpectedly slow? If you can share that, that would help us optimize for your particular use case.
Yes, I set the input matrices dimensions to 1024x1024, and the output matrix dimensions to 1024x1024. The algorithm execution speed on the cusparseSpMM interface with the parameter CUSPARSE_OPERATION_TRANSPOSE is twice as fast as the algorithm execution speed on the cusparseSpMM interface with the parameter CUSPARSE_OPERATION_NON_TRANSPOSE. Therefore, I would like to ask if NVIDIA provides a library for dense matrix * sparse matrix = dense matrix operations.
Okay. If I understand correctly, you are using SpMM to compute C = A^TB where A is a CSR matrix and B,C are dense matrices. You observe that this has slower performance than C=AB. This is expected. The performance loss is not due to a lack of specialize API. It's due to the data layout of A^T. When A is a CSR matrix, A^T has the entries stored column-by-column. This is not an algorithmically-convenient order. We can, of course, try to make it faster, but a significant performance gap is probably unavoidable. If possible, you can try storing A^T instead of A and using opA=NON_TRANSPOSE (or storing A in CSC format). In that case, A^T will have the data arranged row-by-row, and you should get faster performance.
Why doesn't cuSPARSE support dense matrix sparse matrix multiplication resulting in a dense matrix? Many application scenarios require this. Please consider adding support.