NVIDIA / Fuser

A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
Other
271 stars 53 forks source link

[WIP] Enable translation of Hopper matmuls #3440

Open jacobhinkle opened 3 days ago

jacobhinkle commented 3 days ago

Stacked on #3406.

This enables Hopper matmul in our automatic scheduler by translating them without introducing new broadcasts. Specifically:

  1. Update mma_utils::MatmulPattern::translateToMmaOp to optionally avoid intermediates by using an MmaOp::AxisMapping. Enable this option when the target arch is not Ampere or Turing.
  2. Update defaut heuristic to give usable configuration for Hopper
  3. Unguard tests in test_translate_mma.cpp