ROCm / composable_kernel

Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
https://rocm.docs.amd.com/projects/composable_kernel/en/latest/
Other
251 stars 102 forks source link

LDS prefetch pipeline support for FlashAttention #1349

Open ramjana opened 2 weeks ago

ramjana commented 2 weeks ago

Added support for LDS prefetch pipeline , enabled for one of the sources of GEMM sourced from LDS. could be applicable for both sources.