This enables Hopper matmul in our automatic scheduler by translating them without introducing new broadcasts. Specifically:
Update mma_utils::MatmulPattern::translateToMmaOp to optionally avoid intermediates by using an MmaOp::AxisMapping. Enable this option when the target arch is not Ampere or Turing.
Update defaut heuristic to give usable configuration for Hopper
Stacked on #3406.
This enables Hopper matmul in our automatic scheduler by translating them without introducing new broadcasts. Specifically:
mma_utils::MatmulPattern::translateToMmaOp
to optionally avoid intermediates by using anMmaOp::AxisMapping
. Enable this option when the target arch is not Ampere or Turing.test_translate_mma.cpp