NVIDIA / Fuser

A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
Other
257 stars 51 forks source link

Tracking perf optimization of `HopperMatmulTest.HSH_NT_128BSwizzle` #3137

Open zasdfgbnm opened 6 days ago

zasdfgbnm commented 6 days ago

Tracking the perf of a specific shape in this specific unit test using nsys nvprof.

TOT main branch (as measured here):

 Time (%)  Total Time (ns)  Instances  Avg (ns)   Med (ns)   Min (ns)  Max (ns)  StdDev (ns)                                                  Name

 --------  ---------------  ---------  ---------  ---------  --------  --------  -----------  ----------------------------------------------------------------------------------------------------
     20.1          1401712          1  1401712.0  1401712.0   1401712   1401712          0.0  <unnamed>::nvfuser_none_f0_c0_r0_g0(<unnamed>::Tensor<<unnamed>::__half, (int)3, (int)3>, <unnamed>…
     11.2           780952          1   780952.0   780952.0    780952    780952          0.0  nvjet_hsh_192x192_64x3_2x1_v_bz_coopB_NTN

Perf nvFuser/cuBLAS: 55.7%,

zasdfgbnm commented 6 days ago

Initial perf:

 Time (%)  Total Time (ns)  Instances  Avg (ns)   Med (ns)   Min (ns)  Max (ns)  StdDev (ns)                                                  Name

 --------  ---------------  ---------  ---------  ---------  --------  --------  -----------  ----------------------------------------------------------------------------------------------------
     34.9          2974662          1  2974662.0  2974662.0   2974662   2974662          0.0  <unnamed>::nvfuser_none_f0_c0_r0_g0(<unnamed>::Tensor<<unnamed>::__half, (int)3, (int)3>, <unnamed>…
      9.2           780729          1   780729.0   780729.0    780729    780729          0.0  nvjet_hsh_192x192_64x3_2x1_v_bz_coopB_NTN

Perf nvFuser/cuBLAS: 26%

zasdfgbnm commented 6 days ago

After https://github.com/NVIDIA/Fuser/pull/3136:

 Time (%)  Total Time (ns)  Instances  Avg (ns)   Med (ns)   Min (ns)  Max (ns)  StdDev (ns)                                                  Name                                                
 --------  ---------------  ---------  ---------  ---------  --------  --------  -----------  ----------------------------------------------------------------------------------------------------
     24.1          1766866          1  1766866.0  1766866.0   1766866   1766866          0.0  <unnamed>::nvfuser_none_f0_c0_r0_g0(<unnamed>::Tensor<<unnamed>::__half, (int)3, (int)3>, <unnamed>…
     10.6           776826          1   776826.0   776826.0    776826    776826          0.0  nvjet_hsh_192x192_64x3_2x1_v_bz_coopB_NTN

Perf nvFuser/cuBLAS: 44%

zasdfgbnm commented 4 days ago

After https://github.com/NVIDIA/Fuser/pull/3153:

 Time (%)  Total Time (ns)  Instances  Avg (ns)   Med (ns)   Min (ns)  Max (ns)  StdDev (ns)                                                  Name

 --------  ---------------  ---------  ---------  ---------  --------  --------  -----------  ----------------------------------------------------------------------------------------------------
     16.5          1498164          1  1498164.0  1498164.0   1498164   1498164          0.0  <unnamed>::nvfuser_none_f0_c0_r0_g0(<unnamed>::Tensor<<unnamed>::__half, (int)3, (int)3>, <unnamed>…
      8.7           788922          1   788922.0   788922.0    788922    788922          0.0  nvjet_hsh_192x192_64x3_2x1_v_bz_coopB_NTN

Perf nvFuser/cuBLAS: 52.7%

zasdfgbnm commented 4 days ago

After https://github.com/NVIDIA/Fuser/pull/3155:

 Time (%)  Total Time (ns)  Instances  Avg (ns)   Med (ns)   Min (ns)  Max (ns)  StdDev (ns)                                                  Name

 --------  ---------------  ---------  ---------  ---------  --------  --------  -----------  ----------------------------------------------------------------------------------------------------
     20.1          1401712          1  1401712.0  1401712.0   1401712   1401712          0.0  <unnamed>::nvfuser_none_f0_c0_r0_g0(<unnamed>::Tensor<<unnamed>::__half, (int)3, (int)3>, <unnamed>…
     11.2           780952          1   780952.0   780952.0    780952    780952          0.0  nvjet_hsh_192x192_64x3_2x1_v_bz_coopB_NTN

Perf nvFuser/cuBLAS: 55.7%,