NVIDIA / Fuser

A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
Other
271 stars 53 forks source link

Change shape of `HSH_NT_128BSwizzle` #3281

Closed zasdfgbnm closed 3 weeks ago

zasdfgbnm commented 3 weeks ago

This shape makes more sense: https://github.com/NVIDIA/Fuser/issues/3137#issuecomment-2438559998, https://github.com/NVIDIA/Fuser/issues/3279

Perf:

 Time (%)  Total Time (ns)  Instances  Avg (ns)  Med (ns)  Min (ns)  Max (ns)  StdDev (ns)                                                  Name

 --------  ---------------  ---------  --------  --------  --------  --------  -----------  ----------------------------------------------------------------------------------------------------
     43.2           205150          1  205150.0  205150.0    205150    205150          0.0  <unnamed>::nvfuser_none_f0_c0_r0_g0(<unnamed>::Tensor<<unnamed>::__half, (int)3, (int)3>, <unnamed>…
     18.5            87550          1   87550.0   87550.0     87550     87550          0.0  nvjet_hsh_256x128_64x4_1x2_h_bz_coopA_NTT

nvFuser/cuBLAS = 42.7%

zasdfgbnm commented 3 weeks ago

!build