Closed johnhalloran321 closed 2 months ago
Thank you for your request @johnhalloran321 . OLMo itself already supports flash attention - https://github.com/allenai/OLMo/blob/main/olmo/model.py#L525. However, I'm trying to figure out why this error is raised from the transformers library. I was able to reproduce it, by the way. Stay tuned.
@johnhalloran321, could you please remove the "use_flash_attention_2" argument, and instead add "flash_attention=True". With this, the error went away on my end. Please let me know if that resolved it.
Hi @dumitrac,
Thanks for the quick response, that looks to have fixed things. Closing the issue.
Best
🚀 The feature, motivation and pitch
Results in:
ValueError: OLMoForCausalLM does not support Flash Attention 2.0 yet. Please request to add support where the model is hosted, on its model hub page: https://huggingface.co/allenai/OLMo-1B/discussions/new or in the Transformers GitHub repo: https://github.com/huggingface/transformers/issues/new
Could support for flash attention 2.0 be added to the OLMo-1B/7B?
Alternatives
No response
Additional context
No response