NVIDIA / CUDALibrarySamples

CUDA Library Samples
Other
1.5k stars 311 forks source link

LtFp8Matmul Minimum Alignment Requirement #172

Closed ipiszy closed 7 months ago

ipiszy commented 8 months ago

Hi,

Wonder what's the minimum alignment requirement for cuBLAS fp8 Matmul (the LtFp8Matmul example here)? I tried different M, N, K combinations. e.g. when I set M to 63, I got error messages like:

cuBLAS API failed with status 7
terminate called after throwing an instance of 'std::logic_error'
  what():  cuBLAS API failed
Aborted (core dumped)

It seems that cuBLAS only supports alignment: M%16 == 0, K%16==0, N%8==0. But I cannot find this info from any official document.

rsdubtso commented 8 months ago

Hi @ipiszy ,

This is documented in the cublasLtMatmul() section in the part that describes supported data types. Quote:

To use FP8 kernels, the following set of requirements must be satisfied:

  • All matrix pointers must be 16-byte aligned.
  • A must be transposed and B non-transposed (The “TN” format).
  • The compute type must be CUBLAS_COMPUTE_32F.
  • The scale type must be CUDA_R_32F.

Let me know if you have any other questions regarding FP8 in particular or cuBLAS in general.

ipiszy commented 8 months ago

I see, sorry I missed these in the cuBLAS doc. Thank you!

ipiszy commented 7 months ago

Okay I understand the reason now, I think it's because cuBLASLT requires column major and TN format, so the default example requires m, k all 16-bytes aligned.