Open ziyuhuang123 opened 6 months ago
I am puzzled about the permutation concept... I know cutlass way to compute GEMM, https://developer.nvidia.com/blog/cutlass-linear-algebra-cuda/ In timeloop's permutation idea, this is in shared memory KMN and DRAM KMN, right?
I'm not sure I understand the question. Timeloop models a multi-level hierarchy. At each level the permutation describes the pattern in which tiles are sent to the next level in the hierarchy.
I am puzzled about the permutation concept... I know cutlass way to compute GEMM, https://developer.nvidia.com/blog/cutlass-linear-algebra-cuda/ In timeloop's permutation idea, this is in shared memory KMN and DRAM KMN, right?