The current implementation of FlashAttention (PR #123) relies on relatively fine-grained atomic instructions. To better align with the intuition of data flow analysis, we need to carefully consider the granularity, performance, and reasonableness of each atomic instruction.
The current implementation of FlashAttention (PR #123) relies on relatively fine-grained atomic instructions. To better align with the intuition of data flow analysis, we need to carefully consider the granularity, performance, and reasonableness of each atomic instruction.