add flash attention support in training to save memory and speed up

OpenGVLab / LAMM

[NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agents

https://openlamm.github.io/

286 stars 15 forks source link

Closed lighten001 closed 11 months ago

lighten001 commented 1 year ago

add src/model/flash_attn_patch.py
add "--use_flash_attn" arg in train.py to replace LlamaAttention.forward and LlamaModel._prepare_decoder_attention_mask
when using flash attention, llama's use_cache should be False