cuBLAS single precision issue

ww5862 commented 4 months ago

hello

I'm using cublasSgemm for compute single precision GEMM which dimension is 1024x1024x1024. If I compare cublasSgemm and CUTLASS single precision GEMM kernel, the validation is not correct. However compare result with CUTLASS single precision GEMM kernel and CPU code for validation is true. My evaluation setting is RTX3090, and nvcc version is 12.4. CuBLAS can not compute correct result if using mordern nvcc with RTX3090?

rsdubtso commented 4 months ago

Hello @ww5862.

Thanks for the report. A few questions:

Can you please post output of test run after setting the environment variable CUBLASLT_LOG_MASK=64 (e.g. export CUBLASLT_LOG_MASK=64 in bash)? (documentation).
How do you compare the results? By default, cuBLAS uses several optimizations that change the order of operations which affect the result, but it should remain close in most cases.
Do you still see the difference if you call cublasSetMathMode(handle, CUBLAS_PEDANTIC_MATH)? ( documentation)

ww5862 commented 4 months ago

Thank you, I will do it!

NVIDIA / CUDALibrarySamples

cuBLAS single precision issue #181