NVIDIA / cutlass

CUDA Templates for Linear Algebra Subroutines
Other
5.76k stars 988 forks source link

[BUG] my code compiles well in `git reset --hard v3.2.1`, but errors `git reset --hard v3.4.1` #1804

Open zhoutianzi666 opened 2 months ago

zhoutianzi666 commented 2 months ago

Describe the bug

my code compiles well in git reset --hard v3.2.1, but errors git reset --hard v3.4.1

Steps/Code to reproduce bug Follow this guide http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports to craft a minimal bug report. This helps us reproduce the issue you're having and resolve the issue more quickly.

Expected behavior A clear and concise description of what you expected to happen.

Environment details (please complete the following information):

Additional context Add any other context about the problem here.


#include <cuda.h>
#include <stdlib.h>

#include <cute/tensor.hpp>

#define PRINT(name, content) \
    print(name);             \
    print(" : ");            \
    print(content);          \
    print("\n");

#define PRINTTENSOR(name, content) \
    print(name);                   \
    print(" : ");                  \
    print_tensor(content);         \
    print("\n");

using namespace cute;
int main() {
    using T = cute::half_t;
    using s2r_copy_op = SM75_U32x4_LDSM_N;
    using s2r_copy_traits = Copy_Traits<s2r_copy_op>;
    using s2r_copy_atom = Copy_Atom<s2r_copy_traits, T>;

    using S2RCopyAtomA = s2r_copy_atom;
    using S2RCopyAtomB = s2r_copy_atom;

    // mma
    using mma_op = SM80_16x8x16_F32F16F16F32_TN;
    using mma_traits = MMA_Traits<mma_op>;
    using mma_atom = MMA_Atom<mma_traits>;
    static constexpr int kMmaEURepeatM = 1;
    static constexpr int kMmaEURepeatN = 1;
    static constexpr int kMmaEURepeatK = 1;

    using mma_atom_shape = mma_traits::Shape_MNK;
    static constexpr int kMmaPM = 1;
    static constexpr int kMmaPN = 1;
    static constexpr int kMmaPK = 2;
    using MMA_EU_RepeatT = decltype(make_layout(make_shape(
        Int<kMmaEURepeatM>{}, Int<kMmaEURepeatN>{}, Int<kMmaEURepeatK>{})));
    using MMA_P_T = Tile<Int<kMmaPM>, Int<kMmaPN>, Int<kMmaPK>>;
    using MMA = decltype(make_tiled_mma(mma_atom{}, MMA_EU_RepeatT{}, MMA_P_T{}));
    auto s2r_tiled_copy_a = make_tiled_copy_A(S2RCopyAtomA{}, MMA{});
    auto s2r_tiled_copy_b = make_tiled_copy_B(S2RCopyAtomB{}, MMA{});

    // 这个打印的是src到dst的关系哦!
    //print_latex(s2r_tiled_copy_a);
    print_latex(s2r_tiled_copy_b);
}
ccecka commented 2 months ago

The last parameter to make_tiled_mma was updated to be the full MMA Tile shape/permutation.

You can likely either omit it, or use the Shape that you actually intend:

    using MMA = decltype(make_tiled_mma(mma_atom{}, MMA_EU_RepeatT{}));

or

    using MMA_P_T = Tile<_16, _8, _32>;   // 16x8x32 Value Tile for a 16x8x16 Atom
    using MMA = decltype(make_tiled_mma(mma_atom{}, MMA_EU_RepeatT{}, MMA_P_T{}));
github-actions[bot] commented 1 month ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.