NVIDIA / FasterTransformer

Transformer related optimization, including BERT, GPT
Apache License 2.0
5.89k stars 893 forks source link

Why is cublasMMWrapper operations protected by mutex #340

Open chenfucn opened 2 years ago

chenfucn commented 2 years ago

It seems to me all cublas and cublaslt gemm and matmul operations are protected by a mutex. cublas library documents says its thread safe. Why is it necessary to protect these operations with mutex?

byshiue commented 2 years ago

From official document:

The library is thread safe and its functions can be called from multiple host threads, even with the same handle. When multiple threads share the same handle, extreme care needs to be taken when the handle configuration is changed because that change will affect potentially subsequent cuBLAS calls in all threads. It is even more true for the destruction of the handle. So it is not recommended that multiple thread share the same cuBLAS handle.

thread safe is only guaranteed when they don't share handle.

chenfucn commented 2 years ago

Thanks! But why some of the cusparselt calls are not protected and some are?

byshiue commented 2 years ago

All GEMMs have mutexes except https://github.com/NVIDIA/FasterTransformer/blob/main/src/fastertransformer/utils/cublasMMWrapper.cc#L358, which is not used now. We will fix it ASAP.