Closed x574chen closed 1 week ago
@x574chen cusparselt does support multithreading.
I wasn't able to reproduce the failure using your code. Could you try 1) changing line 260 to _cudaStreamt streams[1] = {&stream}; 2) setting env CUSPARSELT_LOG_LEVEL=5 and share the log?
@j4yan Hi, I updated matmul.cpp and got the same error.
Log: cusparse_error.log
Which docker are you using? If it is public, I could try it.
@x574chen I was able to reproduce the error in certain environment. Will need more time looking into it.
Hi @x574chen it turns out cusparselt doesn't work as expected. We are working on it and hopefully fix the issue in future release.
The latest version solved the multithreading issue. Closing it
Configurations
Problem
Our team is integrating cuSPARSELt into a custom Inference Engine to improve quantization performance. We've successfully run Qwen2-7B (a large model structurally similar to LlaMA3) on a single GPU. However, when using multiple GPUs, cusparseLtMatmul throws an internal error. I've identified that this issue occurs only with multi-threading, while multi-processing functions without problems. This problem can be reproduced using the GitHub example matmul_example.cpp.
Error Log
CUSPARSE API failed at line 287 with error: internal error (7)
How to reproduce
Modify latest cuSPARSELt/matmul/matmul_example.cpp to add multithreading:
The complete modified C++ file can be downloaded from the forked repo.
Question
Does cuSPARSELt support multithreading? If it supports multithreading, are we implementing it incorrectly in matmul_example.cpp?