NVIDIA / cutlass

CUDA Templates for Linear Algebra Subroutines
Other
5.52k stars 941 forks source link

[QST]Tensor Shape Mismatch in CUTLASS: Does Layout Information Attach to Pointers? #1817

Open ziyuhuang123 opened 1 month ago

ziyuhuang123 commented 1 month ago

What is your question?

I encountered a strange bug.

Firstly, my SMEM is divided into two regions. One part is for the mainloop (reading A and B), and the other part is for the epilogue (writing C and D). We create a Tensor for mainloop.A: gA_mkl. Due to modifications in the code, we reshape the epilogue.smem_D and create a new Tensor from the pointer. Then we print gA_mkl, which shows the correct shape, like: ArithTuple(_0,_0,_0,_0) o (_128,_64,1,3,1): (_1@0,_1@1,_128@0,_64@1,_1@2). However, when we use shape<2>(gA_mkl), which should logically give an int value of 1, it instead returns the new reshaped shape of epilogue.smem_D!

Logically, creating a Tensor should not be attached to the pointer, as the pointer and layout are distinct properties. Moreover, A and D are in completely different address regions within SMEM. Why is this bug happening?

Previously, I noticed that for data arranged in SMEM using a swizzle pattern, when using the Tensor pointer (like ten_A.data()), there's no need to additionally compose the swizzle in subsequent operations. This suggests that the pointer is not "pure" but carries some attributes with it.

Could it be that the layout is also somehow attached to the pointer? I'm curious about how this is implemented in the cutlass library at a low level.

github-actions[bot] commented 4 days ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.