NVIDIA / Fuser

A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
Other
271 stars 53 forks source link

Add support for stmatrix in the unit test HopperMatmulTest/HSH_NT_128BSwizzle #3411

Open protonu opened 6 days ago

protonu commented 6 days ago

This demonstrates the use of stmatrix in a multi-tile hopper matmul.

protonu commented 2 days ago

Observations:

  1. On running getBankConflictInfo, I see both cases (mma with and without a stmatrix) have zero bank conflicts. The codes I used was:

    KernelExecutor ke;
    auto launch_constraints = LaunchParams();
    ke.compile(
      &fusion,
      {inputs.first, inputs.second},
      launch_constraints,
      matmul_cparams);
    
    auto bank_conflict_info =
      getBankConflictInfo(ke.kernel(), launch_constraints);
    
    if (bank_conflict_info.empty()) {
    debug() << "===== No bank confliction =====" << std::endl;
    } else {
    debug() << "======= Bank confliction =======" << std::endl;
    for (auto info : bank_conflict_info) {
      debug() << "Expr: " << info.first->toString() << std::endl;
      auto conflict = info.second;
      if (conflict.first > 1) {
        debug() << "input conflict: " << conflict.first << " way, ";
      }
      if (conflict.second > 1) {
        debug() << "output conflict: " << conflict.second << " way";
      }
      debug() << std::endl;
    }
    debug() << "================================" << std::endl;
    }
    fusion.printKernel();
  2. Performance sees a minor degradation:

Without stmatrix Time (%) Total Time (ns) Instances Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Name


 36.8           156830          1  156830.0  156830.0    156830    156830          0.0  <unnamed>::nvfuser_none_f0_c0_r0_g0(<unnamed>::Tensor<<unnamed>::__half, (int)3, (int)3>, <unnamed>…
 23.1            98654          1   98654.0   98654.0     98654     98654          0.0  nvjet_hsh_256x128_64x4_1x2_h_bz_coopA_NTT

Where stmatrix was about 63%

With stmatrix this falls to about 59-60%

TODO: Use nsight to look at bank conflict data.