Closed zasdfgbnm closed 3 weeks ago
This shape makes more sense: https://github.com/NVIDIA/Fuser/issues/3137#issuecomment-2438559998, https://github.com/NVIDIA/Fuser/issues/3279
Perf:
Time (%) Total Time (ns) Instances Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Name -------- --------------- --------- -------- -------- -------- -------- ----------- ---------------------------------------------------------------------------------------------------- 43.2 205150 1 205150.0 205150.0 205150 205150 0.0 <unnamed>::nvfuser_none_f0_c0_r0_g0(<unnamed>::Tensor<<unnamed>::__half, (int)3, (int)3>, <unnamed>… 18.5 87550 1 87550.0 87550.0 87550 87550 0.0 nvjet_hsh_256x128_64x4_1x2_h_bz_coopA_NTT
nvFuser/cuBLAS = 42.7%
42.7%
!build
This shape makes more sense: https://github.com/NVIDIA/Fuser/issues/3137#issuecomment-2438559998, https://github.com/NVIDIA/Fuser/issues/3279
Perf:
nvFuser/cuBLAS =
42.7%