OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
37.04k
stars
3.23k
forks
source link
Update Flash Attention forward for Llama 2: #3595
Closed
jordiclive closed 1 year ago
GQA for 34B and 70B and tp have been added
The current flash attn patch forward is for the old transformers' attention forward.