Closed 2015aroras closed 2 months ago
This PR adds support for flash attention and gradient checkpointing to the hf_olmo library.
hf_olmo
Tested by @natolambert (by running it on Open Instruct).
This PR adds support for flash attention and gradient checkpointing to the
hf_olmo
library.Tested by @natolambert (by running it on Open Instruct).