NVIDIA / CUDALibrarySamples

CUDA Library Samples
Other
1.5k stars 311 forks source link

cublasLt FP8 gemm error #185

Closed Sunny-bot1 closed 3 months ago

Sunny-bot1 commented 3 months ago

Hi, when I implement fp8 gemm with batch based on the demo LtFp8Matmul, I met this problem:

cuBLAS API failed with status 7
terminate called after throwing an instance of 'std::logic_error'
  what():  cuBLAS API failed

I implement the batch mode like this:

int batchCount = 2;
int stridea = m * k;
cublasLtMatrixLayoutSetAttribute(Adesc, CUBLASLT_MATRIX_LAYOUT_BATCH_COUNT, &batchCount, sizeof(batchCount));
cublasLtMatrixLayoutSetAttribute(Adesc, CUBLASLT_MATRIX_LAYOUT_STRIDED_BATCH_OFFSET, &stridea, sizeof(stridea));
...

and I already set the initial arg N=2 I can run the original LtFp8Matmul

architecture: Ada cuda version: 12.4

Thank for your help!!!!!!

rsdubtso commented 3 months ago

Batch count must be equal for all the matrices. Output from the run with CUBLASLT_LOG_LEVEL=1:

[2024-05-20 14:52:29][cublasLt][3590223][Error][cublasLtMatmulAlgoGetHeuristic] Input matrices batch counts mismatch: input B matrix batchCount (1) must be equal to all other matrices batch counts, expected (2).

See the LtHSHgemmStridedBatchSimple example for more info.

Sunny-bot1 commented 3 months ago

Batch count must be equal for all the matrices. Output from the run with CUBLASLT_LOG_LEVEL=1:

[2024-05-20 14:52:29][cublasLt][3590223][Error][cublasLtMatmulAlgoGetHeuristic] Input matrices batch counts mismatch: input B matrix batchCount (1) must be equal to all other matrices batch counts, expected (2).

See the LtHSHgemmStridedBatchSimple example for more info.

Thank you for your reply. Sorry, I didin't describe it completely and just take A as an example. I have set the batch count for all matrices.

   int batchCount = 2;
   int stridea = m * k;
   int strideb = n * k;
   int stridec = m * n;
   cublasLtMatrixLayoutSetAttribute(Adesc, CUBLASLT_MATRIX_LAYOUT_BATCH_COUNT, &batchCount, sizeof(batchCount));
   cublasLtMatrixLayoutSetAttribute(Adesc, CUBLASLT_MATRIX_LAYOUT_STRIDED_BATCH_OFFSET, &stridea, sizeof(stridea));
   cublasLtMatrixLayoutSetAttribute(Bdesc, CUBLASLT_MATRIX_LAYOUT_BATCH_COUNT, &batchCount, sizeof(batchCount));
   cublasLtMatrixLayoutSetAttribute(Bdesc, CUBLASLT_MATRIX_LAYOUT_STRIDED_BATCH_OFFSET, &strideb, sizeof(strideb));
   cublasLtMatrixLayoutSetAttribute(Cdesc, CUBLASLT_MATRIX_LAYOUT_BATCH_COUNT, &batchCount, sizeof(batchCount));
   cublasLtMatrixLayoutSetAttribute(Cdesc, CUBLASLT_MATRIX_LAYOUT_STRIDED_BATCH_OFFSET, &stridec, sizeof(stridec));
   cublasLtMatrixLayoutSetAttribute(Ddesc, CUBLASLT_MATRIX_LAYOUT_BATCH_COUNT, &batchCount, sizeof(batchCount));
   cublasLtMatrixLayoutSetAttribute(Ddesc, CUBLASLT_MATRIX_LAYOUT_STRIDED_BATCH_OFFSET, &stridec, sizeof(stridec));

but I still met this problem.

cuBLAS API failed with status 7
terminate called after throwing an instance of 'std::logic_error'
  what():  cuBLAS API failed
rsdubtso commented 3 months ago

Thanks. The problem with the code above is that strides must be int64_t. Therefore, all the to set the stride fail with CUBLAS_STATUS_INVALID_VALUE. See details regarding expected types in the documentation.

Sunny-bot1 commented 3 months ago

Thanks. The problem with the code above is that strides must be int64_t. Therefore, all the to set the stride fail with CUBLAS_STATUS_INVALID_VALUE. See details regarding expected types in the documentation.

I see. Thank you very much!!!