Encountered an error when using cuSPARSELt with multithreading: CUSPARSE API failed with internal error (7)

x574chen commented 2 months ago

Configurations

cuSPARSELt version: 0.6.2
Hardware: A10 with 2 cards
cuda version: 12.1
Driver: 550.90.07

Problem

Our team is integrating cuSPARSELt into a custom Inference Engine to improve quantization performance. We've successfully run Qwen2-7B (a large model structurally similar to LlaMA3) on a single GPU. However, when using multiple GPUs, cusparseLtMatmul throws an internal error. I've identified that this issue occurs only with multi-threading, while multi-processing functions without problems. This problem can be reproduced using the GitHub example matmul_example.cpp.

Error Log

CUSPARSE API failed at line 287 with error: internal error (7)

How to reproduce

Modify latest cuSPARSELt/matmul/matmul_example.cpp to add multithreading:

#include <thread>

int run(int device_id, cudaStream_t stream) {
    CHECK_CUDA(cudaSetDevice(device_id));
    CHECK_CUDA(cudaStreamCreateWithFlags(&stream, cudaStreamNonBlocking));
    <same with original code>
}

int main(void) {
    const int numThreads = 2;
    cudaStream_t streams[numThreads];
    std::thread threads[numThreads];
    for(int i = 0; i< numThreads; i++) {
       threads[i] = std::thread(run, i, streams[i]);
    }
    for (int i =0; i<numThreads;i++) {
       threads[i].join();
    }
}

The complete modified C++ file can be downloaded from the forked repo.

Question

Does cuSPARSELt support multithreading? If it supports multithreading, are we implementing it incorrectly in matmul_example.cpp?

j4yan commented 2 months ago

@x574chen cusparselt does support multithreading.

I wasn't able to reproduce the failure using your code. Could you try 1) changing line 260 to _cudaStreamt streams[1] = {&stream}; 2) setting env CUSPARSELT_LOG_LEVEL=5 and share the log?

x574chen commented 2 months ago

@j4yan Hi, I updated matmul.cpp and got the same error.

Log: cusparse_error.log

Which docker are you using? If it is public, I could try it.

j4yan commented 1 month ago

@x574chen I was able to reproduce the error in certain environment. Will need more time looking into it.

j4yan commented 1 month ago

Hi @x574chen it turns out cusparselt doesn't work as expected. We are working on it and hopefully fix the issue in future release.

x574chen commented 1 week ago

The latest version solved the multithreading issue. Closing it

NVIDIA / CUDALibrarySamples