LAION-AI / Open-Assistant

OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
https://open-assistant.io
Apache License 2.0
37.04k stars 3.23k forks source link

Update Flash Attention forward for Llama 2: #3595

Closed jordiclive closed 1 year ago

jordiclive commented 1 year ago

GQA for 34B and 70B and tp have been added

The current flash attn patch forward is for the old transformers' attention forward.