[QST]Tensor Shape Mismatch in CUTLASS: Does Layout Information Attach to Pointers?

What is your question?

I encountered a strange bug.

Firstly, my SMEM is divided into two regions. One part is for the mainloop (reading A and B), and the other part is for the epilogue (writing C and D). We create a Tensor for mainloop.A: gA_mkl. Due to modifications in the code, we reshape the epilogue.smem_D and create a new Tensor from the pointer. Then we print gA_mkl, which shows the correct shape, like: ArithTuple(_0,_0,_0,_0) o (_128,_64,1,3,1): (_1@0,_1@1,_128@0,_64@1,_1@2). However, when we use shape<2>(gA_mkl), which should logically give an int value of 1, it instead returns the new reshaped shape of epilogue.smem_D!

Logically, creating a Tensor should not be attached to the pointer, as the pointer and layout are distinct properties. Moreover, A and D are in completely different address regions within SMEM. Why is this bug happening?

Previously, I noticed that for data arranged in SMEM using a swizzle pattern, when using the Tensor pointer (like ten_A.data()), there's no need to additionally compose the swizzle in subsequent operations. This suggests that the pointer is not "pure" but carries some attributes with it.

Could it be that the layout is also somehow attached to the pointer? I'm curious about how this is implemented in the cutlass library at a low level.

NVIDIA / cutlass

[QST]Tensor Shape Mismatch in CUTLASS: Does Layout Information Attach to Pointers? #1817