Dao-AILab / flash-attention

Fast and memory-efficient exact attention
BSD 3-Clause "New" or "Revised" License
13.15k stars 1.19k forks source link

flash decoding algorithm numerical error #949

Open hanzz2007 opened 3 months ago

hanzz2007 commented 3 months ago

In combine_attn_seqk_parallel, didn't calulate the global maximum score m and properly rescale O_i , so might have more numerical error than v1 and v2

hanzz2007 commented 3 months ago

@tridao

tridao commented 3 months ago

Can you give a short script showing the numerical error?