Bruce-Lee-LY cuda_hgemm issues - Githubissues

Bruce-Lee-LY / cuda_hgemm

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

MIT License

290 stars 66 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

enable_check 1 结果不对

#12 cokeshao closed 2 months ago
2
wmma下A矩阵采用padding 8好像没有完全解决bank conflict问题？

#11 luliyucoordinate closed 5 months ago
0
为什么B矩阵要transpose？

#10 luliyucoordinate closed 5 months ago
0
请教一个 `wmma_async_stage2.cu` 中的代码细节

#9 luliyucoordinate closed 5 months ago
0
关于permute实现方式

#8 feiyuvl closed 9 months ago
2
关于A/B阵的Layout

#7 feiyuvl closed 9 months ago
1
Question about the tiling size

#6 macto94 closed 10 months ago
2
Cooperative Async Copies

#5 FabianSchuetze closed 10 months ago
2
咨询：Share Mem bank Confict.

#4 matrix97317 closed 10 months ago
1
Change to block of 128 by 256

#3 yupei-ms closed 1 year ago
3
#define CHUNK_K 2 // 32 / WMMA_K

#2 lk137095576 closed 1 year ago
1
mma_naive结果不正确

#1 FdyCN closed 1 year ago
1