ROCm / composable_kernel

Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
https://rocm.docs.amd.com/projects/composable_kernel/en/latest/
Other
297 stars 113 forks source link

Universal streamk with atomics #1360

Closed hsadasiv closed 2 months ago

hsadasiv commented 3 months ago

universal_streamk with atomics is implemented & integrated with ckprofiler. Lot of the modified files are ckprofiler instances and need only be skimmed through. universal_streamk supports 1-tile and 2-tile stream-k and programmable grid_size (that defaults to max occupancy) for a persistent kernel. As PR is already huge, a second PR will be submitted later for supporting stream-k with reduction and advanced tile sizzling mechanisms. Example is added to readme for better understanding of usage: https://github.com/ROCm/composable_kernel/blob/universal_streamk/example/01_gemm/README.md

aosewski commented 3 months ago

@hsadasiv I believe you've unintentionally changed file permissions: obraz

hsadasiv commented 2 months ago

@aosewski Thank You for your detailed review. I appreciate your effort! I have tried to address all of them. Please check and approve.