NVIDIA / cutlass

CUDA Templates for Linear Algebra Subroutines
Other
5.58k stars 949 forks source link

[BUG] ASM instruction error #690

Closed renderist closed 1 year ago

renderist commented 1 year ago

Describe the bug On line 454 of include/cutlass/arch/memory.h, the asm instruction ld.shared.v4.u32 looks like it should be a std.shared.v4.u32. Appears to be a copy and paste error.

Steps/Code to reproduce bug Visual inspection, the function is shared_store<16>(), so would expect it to have a store instruction instead of a load instruction.

Additional context Appears in current Public facing GitHub repository.

hwu36 commented 1 year ago

yes, you are right. we will fix in the next release. thank you.

renderist commented 1 year ago

Just curious, is that function being called at all in your tests?

On Nov 7, 2022, at 8:51 PM, Haicheng Wu @.***> wrote:



yes, you are right. we will fix in the next release. thank you.

— Reply to this email directly, view it on GitHubhttps://github.com/NVIDIA/cutlass/issues/690#issuecomment-1306631254, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABI2FODQJOPCZEGDQ56IZPLWHHL5VANCNFSM6AAAAAARZ3X4TQ. You are receiving this because you authored the thread.Message ID: @.***>

hwu36 commented 1 year ago

no, this one is not needed in real kernels. so it is not tested.

mnicely commented 1 year ago

Fixed in 2.11