InfiniTensor / InfiniGen

Apache License 2.0
1 stars 1 forks source link

矩阵乘算子支持 #52

Open wanghailu0717 opened 8 months ago

wanghailu0717 commented 8 months ago

1.请深入矩阵乘算子的运算过程,挖掘如下可能的性能点 1.1 并行性 1.2 高效 IO 1.3 高效计算 2.考虑如下的功能点 2.1 后融合激活操作或者下一个算子 2.2 前融合前一个算子 3.提供多种计算内核的选项,例如 cuda 平台的 cuda core / tensor core;bang 平台的 张量核 / 卷积核。

KuangjuX commented 7 months ago

Commit 32b9b15188c1287adef47ffdee5d910a16660fc9 has successfully executed the GEMM code generated by CuTe on InfiniGen, and its performance has been compared with cublas. However, the current code generation is based on direct template copying. The next steps will include:

KuangjuX commented 7 months ago

Some fuse kernels: