ROCm / composable_kernel

Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
https://rocm.docs.amd.com/projects/composable_kernel/en/latest/
Other
251 stars 102 forks source link

Fix to the using of static_for in amd_buffer_addressing.hpp #1337

Closed qianfengz closed 2 weeks ago

qianfengz commented 2 weeks ago

This is for fixing a compiling issue which only occurred when building xformers C++ extension for ROCM. The issue does not occur with CK example/ck_tile/01_fmha example since the current 1, 2, 4, 8, 16, 32 over-loadings are able to cover all cases.