ROCm / triton

Development repository for the Triton language and compiler
MIT License
83 stars 27 forks source link

Merge changes from upstream FA bwd kernel #444

Closed vgokhale closed 8 months ago

vgokhale commented 8 months ago

The upstream kernel had some optimizations, specifically for splitting the causal mask across two calls so only one needs to incur the warp divergence penalty. Performance details here.