Olmo does not support Flash Attention 2.0

WilliamsToTo commented 4 weeks ago

When I use Lora to finetune olmo-7b-instruct using finetune_lora_with_accelerate.sh. It reports the below information.

loading file tokenizer.json loading file added_tokens.json loading file special_tokens_map.json loading file tokenizer_config.json Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. loading weights file /data/ccu/taof/olmo/olmo_7B_instruct/model.safetensors.index.json Detected DeepSpeed ZeRO-3: activating zero.init() for this model Traceback (most recent call last): File "/home/taof/open-instruct/open_instruct/finetune.py", line 894, in main() File "/home/taof/open-instruct/open_instruct/finetune.py", line 559, in main model = AutoModelForCausalLM.from_pretrained( File "/home/taof/open-instruct/env/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 560, in from_pretrained return model_class.from_pretrained( File "/home/taof/open-instruct/env/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3084, in from_pretrained config = cls._check_and_enable_flash_attn_2(config, torch_dtype=torch_dtype, device_map=device_map) File "/home/taof/open-instruct/env/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1267, in _check_and_enable_flash_attn_2 raise ValueError( ValueError: The current architecture does not support Flash Attention 2.0. Please open an issue on GitHub to request support for this architecture: https://github.com/huggingface/transformers/issues/new

I tried to convert use_flash_attention_2=True if args.use_flash_attn else False to flash_attention=True if args.use_flash_attn else False as mentioned at https://github.com/allenai/OLMo/issues/557#issuecomment-2078369571. But it does not work.

Do you know how to fix it?

natolambert commented 4 weeks ago

@WilliamsToTo which OLMo model are you using. Note that some of them end with -hf, so please be specific. The current installation requirements don't really support OLMo, which is being updated soon in #151. It would be wonderful if you could try this pr :)

WilliamsToTo commented 3 weeks ago

I downloaded allenai/OLMo-7B-Instruct from hugging face. What is the difference between allenai/OLMo-7B-Instruct-hf and allenai/OLMo-7B-Instruct?

natolambert commented 3 weeks ago

TLDR the -hf versions are natively compatible with HuggingFace, see https://github.com/huggingface/transformers/pull/29890. In the future, all will be compatible. This was a lesson for our first new models. @WilliamsToTo

Try this: https://huggingface.co/allenai/OLMo-7B-Instruct-hf

WilliamsToTo commented 3 weeks ago

Got it. Thanks a lot. I'm using https://huggingface.co/allenai/OLMo-7B-Instruct-hf. It supports flash attention.

allenai / open-instruct

Olmo does not support Flash Attention 2.0 #173