Open zasdfgbnm opened 6 days ago
Initial perf:
Time (%) Total Time (ns) Instances Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Name
-------- --------------- --------- --------- --------- -------- -------- ----------- ----------------------------------------------------------------------------------------------------
34.9 2974662 1 2974662.0 2974662.0 2974662 2974662 0.0 <unnamed>::nvfuser_none_f0_c0_r0_g0(<unnamed>::Tensor<<unnamed>::__half, (int)3, (int)3>, <unnamed>…
9.2 780729 1 780729.0 780729.0 780729 780729 0.0 nvjet_hsh_192x192_64x3_2x1_v_bz_coopB_NTN
Perf nvFuser/cuBLAS: 26%
After https://github.com/NVIDIA/Fuser/pull/3136:
Time (%) Total Time (ns) Instances Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Name
-------- --------------- --------- --------- --------- -------- -------- ----------- ----------------------------------------------------------------------------------------------------
24.1 1766866 1 1766866.0 1766866.0 1766866 1766866 0.0 <unnamed>::nvfuser_none_f0_c0_r0_g0(<unnamed>::Tensor<<unnamed>::__half, (int)3, (int)3>, <unnamed>…
10.6 776826 1 776826.0 776826.0 776826 776826 0.0 nvjet_hsh_192x192_64x3_2x1_v_bz_coopB_NTN
Perf nvFuser/cuBLAS: 44%
After https://github.com/NVIDIA/Fuser/pull/3153:
Time (%) Total Time (ns) Instances Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Name
-------- --------------- --------- --------- --------- -------- -------- ----------- ----------------------------------------------------------------------------------------------------
16.5 1498164 1 1498164.0 1498164.0 1498164 1498164 0.0 <unnamed>::nvfuser_none_f0_c0_r0_g0(<unnamed>::Tensor<<unnamed>::__half, (int)3, (int)3>, <unnamed>…
8.7 788922 1 788922.0 788922.0 788922 788922 0.0 nvjet_hsh_192x192_64x3_2x1_v_bz_coopB_NTN
Perf nvFuser/cuBLAS: 52.7%
After https://github.com/NVIDIA/Fuser/pull/3155:
Time (%) Total Time (ns) Instances Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Name
-------- --------------- --------- --------- --------- -------- -------- ----------- ----------------------------------------------------------------------------------------------------
20.1 1401712 1 1401712.0 1401712.0 1401712 1401712 0.0 <unnamed>::nvfuser_none_f0_c0_r0_g0(<unnamed>::Tensor<<unnamed>::__half, (int)3, (int)3>, <unnamed>…
11.2 780952 1 780952.0 780952.0 780952 780952 0.0 nvjet_hsh_192x192_64x3_2x1_v_bz_coopB_NTN
Perf nvFuser/cuBLAS: 55.7%
,
Tracking the perf of a specific shape in this specific unit test using
nsys nvprof
.TOT main branch (as measured here):
Perf nvFuser/cuBLAS:
55.7%
,