NVIDIA / cutlass

CUDA Templates for Linear Algebra Subroutines
Other
5.52k stars 941 forks source link

[BUG] ElementC=void kernel reads non-void in `GemmDescription` #1633

Open manishucsd opened 3 months ago

manishucsd commented 3 months ago

I am observing gemm_desc.C.element = bf16, when I set it as void.

Please use the following debug_branch

Please check if the below print is expected. description_.C.element bf16 // Is this expected?

We are choosing incorrect kernel because of this, is this a bug or I am messing something up?

./tools/profiler/cutlass_profiler --kernels=cutlass3x_sm90_tensorop_s64x128x32gemm_e4m3_e4m3_f32_void_bf16_128x128x128_1x2x1_0_tnn_align16_warpspecialized_epi_nosmem
description_.C.element bf16       // Is this expected?
description_.C.element bf16
description_.C.element bf16
description_.C.element bf16
description_.C.element bf16
description_.C.element bf16
description_.C.element bf16
description_.C.element bf16

=============================
  Problem ID: 1

        Provider: CUTLASS
   OperationKind: gemm
       Operation: cutlass3x_sm90_tensorop_s64x128x32gemm_e4m3_e4m3_f32_void_bf16_128x128x128_1x2x1_0_tnn_align16_warpspecialized_epi_nosmem

          Status: Success
    Verification: ON
     Disposition: Passed

reference_device: Passed
          cuBLAS: Not run
           cuDNN: Not run

       Arguments: --gemm_kind=universal --m=1024 --n=1024 --k=1024 --A=fe4m3:row --B=fe4m3:column --C=bf16:column --D=bf16:column  \
                  --alpha=1 --beta=0 --split_k_mode=serial --split_k_slices=1 --batch_count=1 --raster_order=heuristic  \
                  --op_class=tensorop --accum=f32 --cta_m=128 --cta_n=128 --cta_k=128 --cluster_m=1 --cluster_n=2 --cluster_k=1  \
                  --stages=7 --warps_m=4 --warps_n=1 --warps_k=1 --inst_m=64 --inst_n=128 --inst_k=32 --min_cc=90 --max_cc=90  \

           Bytes: 4194304  bytes
           FLOPs: 2149580800  flops
           FLOPs/Byte: 512

         Runtime: 0.0113731  ms
          Memory: 343.463 GiB/s

            Math: 189005 GFLOP/s

=============================

CSV Results:
manishucsd commented 3 months ago

@thakkarV , @IonThruster , @hwu36 , @mnicely

thakkarV commented 2 months ago

@mnicely can we commit this for 3.6 please?

mnicely commented 2 months ago

Yes.

github-actions[bot] commented 1 month ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.