[Bug] Deterministic backward is not implemented in the real code

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

BSD 3-Clause "New" or "Revised" License

14.2k stars 1.33k forks source link

Open li126com opened 10 months ago

li126com commented 10 months ago

Implement deterministic backward (thanks to Meituan) This commit is not implemented in the mha.py. Parameter calling should be added to both FlashSelfAttention and FlashCrossAttention.

tridao commented 10 months ago

Thanks, do you want to add a PR?

li126com commented 10 months ago

Thanks, do you want to add a PR?