On line 10 of Algorithm 1 (FlashAttention-2 forward pass) of the Flash Attention 2 paper it says
However, $\text{diag}\left(e^{m_i^{j-1} - m_i^{j}}\right)^{-1}\mathbf{O}_i^{(j-1)}$ should not have the $^{-1}$ and actually just be $\text{diag}\left(e^{m_i^{j-1} - m_i^{j}}\right)\mathbf{O}_i^{(j-1)}$.
Similar the online softmax trick example at the top of page 6 should also not have the $^{-1}$ before the $\tilde{\mathbf{O}}^{(1)}$
Lastly, there is a typo, where the following line
should be deleted from the online softmax trick derivation in page 6. It looks like it was copied from the online softmax trick derivation on page 4 for Flash Attention 1 and was not removed.
On line 10 of Algorithm 1 (FlashAttention-2 forward pass) of the Flash Attention 2 paper it says
However, $\text{diag}\left(e^{m_i^{j-1} - m_i^{j}}\right)^{-1}\mathbf{O}_i^{(j-1)}$ should not have the $^{-1}$ and actually just be $\text{diag}\left(e^{m_i^{j-1} - m_i^{j}}\right)\mathbf{O}_i^{(j-1)}$.
Similar the online softmax trick example at the top of page 6 should also not have the $^{-1}$ before the $\tilde{\mathbf{O}}^{(1)}$
Lastly, there is a typo, where the following line should be deleted from the online softmax trick derivation in page 6. It looks like it was copied from the online softmax trick derivation on page 4 for Flash Attention 1 and was not removed.