TiledTensor / TiledCUDA

TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.
MIT License
148 stars 10 forks source link

chore: Add `TiledFlashAttention` to improve usage and fix `CMakeLists` to add all examples automatically. #132

Closed KuangjuX closed 2 months ago

KuangjuX commented 2 months ago

Resolved #120.