Open shivadbhavsar opened 6 months ago
Initial work resulted in no perf difference. Rocprofiler results on trimmed unet:
MIGRAPHX_MLIR_USE_SPECIFIC_OPS=attention
- Cache hits are mostly the same with and without reversal (with some being considerably lower) Next Steps: Understand cache hits with even smaller graphs
mul -> dot -> add
program which is compiled as mul -> dot_add
where mlir_dot_add is reverse indexed when reversal is applied. There is no change in cache hits when reversal is applied
Add functionality to apply workgroup reversals to increase cache hits.