Question about llama_flash_attn_monkey_patch - Githubissues

PKU-YuanGroup / Chat-UniVi

[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

https://arxiv.org/abs/2311.08046

Apache License 2.0

755 stars 41 forks source link

Question about llama_flash_attn_monkey_patch #35

Closed mmmwhy closed 3 months ago

mmmwhy commented 4 months ago

https://github.com/PKU-YuanGroup/Chat-UniVi/blob/main/ChatUniVi/train/llama_flash_attn_monkey_patch.py is different with https://github.com/haotian-liu/LLaVA/blob/main/llava/train/llama_flash_attn_monkey_patch.py

for example:

https://github.com/PKU-YuanGroup/Chat-UniVi/blob/main/ChatUniVi/train/llama_flash_attn_monkey_patch.py and

https://github.com/haotian-liu/LLaVA/blob/main/llava/train/llama_flash_attn_monkey_patch.py

it seems chat-univi change some code in llama_flash_attn_monkey_patch, can you help explain the reason for modifying the code? ♥️

jpthu17 commented 4 months ago

We use standard multi-head attention. Since LLaMA 3 uses grouped-query attention, we guess that LLaVA made changes following LLaMA 3. (The main purpose of grouped-query attention is to reduce KV cache.)