Does gpt_gemm still useful when use sm_80 and newer GPU architectures

NVIDIA / FasterTransformer

Transformer related optimization, including BERT, GPT

Apache License 2.0

5.86k stars 891 forks source link

Does gpt_gemm still useful when use sm_80 and newer GPU architectures #717

Open tingshua-yts opened 1 year ago

tingshua-yts commented 1 year ago

Branch/Tag/Commit

main

Docker Image Version

nvcr.io/nvidia/pytorch:22.08-py3

GPU name

A10

CUDA Driver

525.105.17

Reproduced Steps

according to cubalas doc https://docs.nvidia.com/cuda/cublas/#cublasgemmalgo-t

"cublasGemmAlgo_t type is an enumerant to specify the algorithm for matrix-matrix multiplication on GPU architectures up to sm_75. On sm_80 and newer GPU architectures, this enumarant has no effect"

should we still need use gpt_gemm On sm_80 and newer GPU architectures

lzhangzz commented 1 year ago

Yes it's still useful for sm_80 GPUs. It benchmarks not only cuBLAS but also cuBLASLt (which has a lot more combinations than cuBLAS).