mma_naive结果不正确

Bruce-Lee-LY / cuda_hgemm

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

MIT License

290 stars 66 forks source link

Closed FdyCN closed 1 year ago

FdyCN commented 1 year ago

我是windows 11 + cuda 11.8 + sm_86，测试了一下mma_naive.cu的kernel，结果不正确。本地还在review确认中，大佬有空的话可以看看是否确实有问题。

Bruce-Lee-LY commented 1 year ago

未适配过windows，后续也暂未有计划适配windows