Bruce-Lee-LY / cuda_hgemm

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
MIT License
290 stars 66 forks source link

mma_naive结果不正确 #1

Closed FdyCN closed 1 year ago

FdyCN commented 1 year ago

我是windows 11 + cuda 11.8 + sm_86,测试了一下mma_naive.cu的kernel,结果不正确。本地还在review确认中,大佬有空的话可以看看是否确实有问题。

Bruce-Lee-LY commented 1 year ago

未适配过windows,后续也暂未有计划适配windows