NVIDIA / cutlass

CUDA Templates for Linear Algebra Subroutines
Other
5.7k stars 978 forks source link

[QST]41_fused_multi_head_attention on sm89 #1597

Open Sunny-bot1 opened 5 months ago

Sunny-bot1 commented 5 months ago

What is your question? when I use sm89

int run_attention(Options& options) {
  using Attention = AttentionKernel<
    cutlass::half_t,      // scalar_t
    cutlass::arch::Sm89,  // ArchTag
    true,                 // Memory is aligned
    kQueriesPerBlock,
    kKeysPerBlock,
    kMaxK,
    false,                // Supports dropout
    false                 // Supports bias
  >;

I get compile error

Building CUDA object examples/41_fused_multi_head_attention/CMakeFiles/41_fused_multi_head_attention_fixed_seqlen.dir/fused_multihead_attention_fixed_seqlen.cu.o
/cutlass/examples/41_fused_multi_head_attention/kernel_forward.h(409): error: incomplete type is not allowed
          kIsAligned ? DefaultConfig::kAlignmentA : GemmType::kMinimumAlignment;
                       ^
          detected during:
            instantiation of class "AttentionKernel<scalar_t_, ArchTag, isAligned_, kQueriesPerBlock_, kKeysPerBlock_, kMaxK_, kSupportsDropout_, kSupportsBias_, ToBatchHookType_>::MM0 [with scalar_t_=cutlass::half_t, ArchTag=cutlass::arch::Sm89, isAligned_=true, kQueriesPerBlock_=32, kKeysPerBlock_=128, kMaxK_=128, kSupportsDropout_=false, kSupportsBias_=false, ToBatchHookType_=DefaultToBatchHook]" at line 418
            instantiation of class "AttentionKernel<scalar_t_, ArchTag, isAligned_, kQueriesPerBlock_, kKeysPerBlock_, kMaxK_, kSupportsDropout_, kSupportsBias_, ToBatchHookType_>::MM0 [with scalar_t_=cutlass::half_t, ArchTag=cutlass::arch::Sm89, isAligned_=true, kQueriesPerBlock_=32, kKeysPerBlock_=128, kMaxK_=128, kSupportsDropout_=false, kSupportsBias_=false, ToBatchHookType_=DefaultToBatchHook]" at line 1161
            instantiation of class "AttentionKernel<scalar_t_, ArchTag, isAligned_, kQueriesPerBlock_, kKeysPerBlock_, kMaxK_, kSupportsDropout_, kSupportsBias_, ToBatchHookType_> [with scalar_t_=cutlass::half_t, ArchTag=cutlass::arch::Sm89, isAligned_=true, kQueriesPerBlock_=32, kKeysPerBlock_=128, kMaxK_=128, kSupportsDropout_=false, kSupportsBias_=false, ToBatchHookType_=DefaultToBatchHook]" at line 329 of /home/sunxin20/cutlass/examples/41_fused_multi_head_attention/fused_multihead_attention_fixed_seqlen.cu
            instantiation of class "TestbedAttention<Attention> [with Attention=AttentionKernel<cutlass::half_t, cutlass::arch::Sm89, true, 32, 128, 128, false, false, DefaultToBatchHook>]" at line 1032 of /home/sunxin20/cutlass/examples/41_fused_multi_head_attention/fused_multihead_attention_fixed_seqlen.cu
            instantiation of "int run_attention<kQueriesPerBlock,kKeysPerBlock,kMaxK>(Options &) [with kQueriesPerBlock=32, kKeysPerBlock=128, kMaxK=128]" at line 1110 of /home/sunxin20/cutlass/examples/41_fused_multi_head_attention/fused_multihead_attention_fixed_seqlen.cu
thakkarV commented 5 months ago

just change your CUTLASS_NVCC_ARCHS=89, do not edit this file yourself. There is no partial template specialization for SM89 arch tag.

Sunny-bot1 commented 5 months ago

just change your CUTLASS_NVCC_ARCHS=89, do not edit this file yourself. There is no partial template specialization for SM89 arch tag.

I see. Thank you! If I want support the fp8 input type, what should I pay attention to?

github-actions[bot] commented 3 months ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions[bot] commented 1 week ago

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.