TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.
148
stars
10
forks
source link
chore: Add `TiledFlashAttention` to improve usage and fix `CMakeLists` to add all examples automatically. #132
Closed
KuangjuX closed 2 months ago
Resolved #120.