ROCm / composable_kernel

Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
https://rocm.docs.amd.com/projects/composable_kernel/en/latest/
Other
297 stars 113 forks source link

[CK_TILE] fmha forward split-kv + combine kernels #1338

Closed poyenc closed 3 months ago

poyenc commented 3 months ago

please merge #1355 before this PR

poyenc commented 3 months ago

I'm fixing remaining bugs. Please donot merge this PR.

poyenc commented 3 months ago

I fixed all the bugs which were introduced by new code. Just need some tweaks of the LSE-scale-applying logics in combine kernel.

poyenc commented 3 months ago

It's possible to remove uneven split checks in splitkv pipeline, I'm working on it now.

poyenc commented 3 months ago

pipeline refinement was done.

poyenc commented 3 months ago

shall merge #1355 for better file organization