Closed denadai2 closed 5 months ago
Hi,
Thank you for your interest. Since we need to use a 4d attention mask, but the open source flash attention only supports a 2d casual attention mask, we chose the standard sdpa and modified the attention mask based on that.
Dear authors, first of all congrats for your idea and paper!!
I have a question about the code. I see here https://github.com/Yxxxb/VoCo-LLaMA/blob/79859d0ad7df2f322ae7b06c58d246b062f39ffd/llava/model/language_model/llava_llama_1stg.py#L263 that in flash attention you do not modify the attention mask. Is it expected?
thanks