NVIDIA / cutlass

CUDA Templates for Linear Algebra Subroutines
Other
5k stars 852 forks source link

[QST] Tiled copy misaligned, how to solve it? #1561

Open 4grass opened 1 month ago

4grass commented 1 month ago

My code:

using GmemTiledCopyL = decltype(make_tiled_copy(
      Copy_Atom<DefaultCopy, half>{}, Layout<Shape<_1>>{}, Layout<Shape<_1>>{}));
using SmemLayoutL = decltype(Layout<Shape<Int<64>>>{});
__shared__ cute::array_aligned<half, cute::cosize_v<SmemLayoutL>> l;
GmemTiledCopyL gmem_tiled_copy_L;

auto gmem_thr_copy_LD = gmem_tiled_copy_LD.get_thread_slice(tid);
Tensor _L = make_tensor(make_gmem_ptr(reinterpret_cast<const float*>(L),
                          make_shape(C), make_stride(Int<1>{}));
Tensor gL = local_tile(_L, make_tile(Int<64>{}),
                         make_coord(_));  // [64, C / 64]
Tensor sL = make_tensor(make_smem_ptr(shared_storage.l.data()),
                          SmemLayoutL{});  // [64]
Tensor gL_to_sL_src = gmem_thr_copy_LD.partition_S(gL);
Tensor gL_to_sL_dst = gmem_thr_copy_LD.partition_D(sL);
cute::copy(gmem_tiled_copy_L, gL_to_sL_src(_, _, 0), gL_to_sL_dst);

if my C is not divided by 8, then will throw error: CUDA get error, code:716, msg:misaligned address How to solve it? Thanks.

github-actions[bot] commented 4 weeks ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.