We were using rocWMMA for some initial HIP-based development of Stream-K, I thought it would be best if it got merged into rocWMMA instead.
Muhammad Osama, Duane Merrill, Cris Cecka, Michael Garland, and John D. Owens. Stream-K: Work-centric Parallel Decomposition for Dense Matrix-Matrix Multiplication on the GPU. arXiv, January 2023. Appeared as a poster paper in Proceedings of the 28th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2023, February–March 2023. https://arxiv.org/abs/2301.03598
We were using rocWMMA for some initial HIP-based development of Stream-K, I thought it would be best if it got merged into rocWMMA instead.