according to cubalas doc https://docs.nvidia.com/cuda/cublas/#cublasgemmalgo-t
"cublasGemmAlgo_t type is an enumerant to specify the algorithm for matrix-matrix multiplication on GPU architectures up to sm_75. On sm_80 and newer GPU architectures, this enumarant has no effect"
should we still need use gpt_gemm On sm_80 and newer GPU architectures
Branch/Tag/Commit
main
Docker Image Version
nvcr.io/nvidia/pytorch:22.08-py3
GPU name
A10
CUDA Driver
525.105.17
Reproduced Steps