Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.
1.53k
stars
95
forks
source link
bug: start_position support for the fused attention kernel #329
Open
ipoletaev opened 1 year ago
Description
Using of a start position index in a fused attention kernel does not work.
Steps to reproduce
Expected Behavior
Almost identical prediction as with the vanilla implementation for any start position index.
Actual Behavior
Returns
nan
for anySTART_IDX != 0
.Your environment
torch==2.0.0 triton==2.0.0
Self-service
Code of Conduct