ROCm / AMDMIGraphX

AMD's graph optimization engine.
https://rocm.docs.amd.com/projects/AMDMIGraphX/en/latest/
MIT License
183 stars 84 forks source link

Workgroup Reversal #2864

Open shivadbhavsar opened 6 months ago

shivadbhavsar commented 6 months ago

Add functionality to apply workgroup reversals to increase cache hits.

shivadbhavsar commented 5 months ago

Initial work resulted in no perf difference. Rocprofiler results on trimmed unet:

  1. using MIGRAPHX_MLIR_USE_SPECIFIC_OPS=attention - Cache hits are mostly the same with and without reversal (with some being considerably lower)
  2. using `MIGRAPHX_MLIR_USE_SPECIFIC_OPS=convolution,dot,fused,attention' - Cache hits are noticeably higher for some kernels with reversals, but overall perf is consistently worse with reversals
shivadbhavsar commented 5 months ago

Next Steps: Understand cache hits with even smaller graphs

  1. Performed test with mul -> dot -> add program which is compiled as mul -> dot_add where mlir_dot_add is reverse indexed when reversal is applied. There is no change in cache hits when reversal is applied