Closed wzhcz8902 closed 1 month ago
This issue has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d
if there is no activity in the next 60 days.
Should shared memory usage be checked?
It is nice to have, but not critical. shared memory usage is decided statically. wrong tile size and stage combination should not be instantiated. this can_implement
function mostly checks runtime values.
Why is it important to make sure the global address is aligned?
unaligned address will cause illegal memory access
failure.
https://github.com/NVIDIA/cutlass/blob/033d9efd2db0bbbcf3b3b0650acde6c472f3948e/include/cutlass/gemm/kernel/gemm.h#L153-L199
For multistage pipeline, the usage of shared memory is proportional with the number of stages applied, so there exists a maximum value of the stages beyond which there will be errors running the kernel. I checked the
can_implement
function, which seems only care about the alignment of tensor addresses in global memory. Should shared memory usage be checked? Why is it important to make sure the global address is aligned?