Fast(er) matmul - Githubissues

google-research / dex-lang

Research language for array processing in the Haskell/ML family

BSD 3-Clause "New" or "Revised" License

1.58k stars 106 forks source link

Fast(er) matmul #1253

Closed axch closed 1 year ago

axch commented 1 year ago

Hand-tiling matrix multiply in Dex speeds it up from ~100ms to ~5.5ms on my laptop (for a 500x500x500 dense multiplication).

There are some caveats, though:

The tile sizes are just arbitrary numbers, and there is no tuning to different hardware.
The hand-tiled version relies on Writer to construct the output in place, which defeats output fusion. (But then again, I don't know that the previous implementation would have fused well on the output either.)
Adding @noinline to this nerfs its performance back to ~40ms, presumably because it defeats LLVM's vectorizer (but I'm not sure why it does that).