NVIDIA / Fuser

A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
Other
257 stars 51 forks source link

Additional features to fill out `ops.matmul` #2092

Closed Priya2698 closed 4 months ago

Priya2698 commented 5 months ago

We currently only support 2D inputs for matmul. Extend implementation to support additional cases:

  1. Both A and B are 1D ([M,] x [M,])
  2. A is 1D and B is 2D ([M,] x [M, N])
  3. A is 2D and B is 1D ([M, N] x [N,])
  4. A and B are atleast 1D with one of the matrices having more than 2-dimensions ([B, M, N] x [N,]).

Torch reference: https://pytorch.org/docs/stable/generated/torch.matmul.html Thunder reference: https://github.com/Lightning-AI/lightning-thunder/blob/a28575345fcdc18bf4b9163dfb239195dca9f34d/thunder/tests/opinfos.py#L5299

jacobhinkle commented 5 months ago

We should also make sure we support more than one batch dim, including for example batch dims between the M and K dims of A. Since there's a potential to implement these differently based on nDims() we should also be sure to add a benchmark (single input size is fine) comparing bmm with batch_size=1 to a 2D matmul of the same size as a kind of perf sanity check.

Priya2698 commented 5 months ago

This issue will be now resolved through work on new IR nodes: https://github.com/NVIDIA/Fuser/issues/2149

Priya2698 commented 4 months ago

PR #2175 and #2209 add support for all cases accepted by torch.matmul.