Request for FlashAttention v2 Support

Thanks for the great work!

Hi, I was wondering if it would be possible to update the current version of FlashAttention to v2 in the scGPT model. FlashAttention v2 offers several improvements in terms of memory efficiency and performance, which could be beneficial for scaling up models like scGPT, especially when working with large batch sizes or longer sequences. Are there any plans to integrate FlashAttention v2 in future releases?

Thank you for your consideration!

bowang-lab / scGPT

Request for FlashAttention v2 Support #249