ROCm / rocWMMA

rocWMMA
https://rocm.docs.amd.com/projects/rocWMMA/
MIT License
91 stars 26 forks source link

Adds support for Stream-K algorithm for HGEMM. #398

Closed neoblizz closed 5 months ago

neoblizz commented 6 months ago

We were using rocWMMA for some initial HIP-based development of Stream-K, I thought it would be best if it got merged into rocWMMA instead.

Muhammad Osama, Duane Merrill, Cris Cecka, Michael Garland, and John D. Owens. Stream-K: Work-centric Parallel Decomposition for Dense Matrix-Matrix Multiplication on the GPU. arXiv, January 2023. Appeared as a poster paper in Proceedings of the 28th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2023, February–March 2023. https://arxiv.org/abs/2301.03598

neoblizz commented 6 months ago

@dlangbe changed the base as requested.

neoblizz commented 5 months ago

@cgmillette @dlangbe anything else you need from me?