allenai / OLMo

Modeling, training, eval, and inference code for OLMo
https://allenai.org/olmo
Apache License 2.0
4.79k stars 488 forks source link

[HF OLMo] Add flash attention and gradient checkpointing support #719

Closed 2015aroras closed 2 months ago

2015aroras commented 2 months ago

This PR adds support for flash attention and gradient checkpointing to the hf_olmo library.

Tested by @natolambert (by running it on Open Instruct).