Dao-AILab / flash-attention

Fast and memory-efficient exact attention
BSD 3-Clause "New" or "Revised" License
14.2k stars 1.33k forks source link

[Bug] Deterministic backward is not implemented in the real code #746

Open li126com opened 10 months ago

li126com commented 10 months ago

Implement deterministic backward (thanks to Meituan) This commit is not implemented in the mha.py. Parameter calling should be added to both FlashSelfAttention and FlashCrossAttention.

tridao commented 10 months ago

Thanks, do you want to add a PR?

li126com commented 10 months ago

Thanks, do you want to add a PR?

sure, https://github.com/Dao-AILab/flash-attention/pull/748