allenai / OLMo

Modeling, training, eval, and inference code for OLMo
https://allenai.org/olmo
Apache License 2.0
4.24k stars 400 forks source link

Gradient Checkpointing #549

Open fakerybakery opened 3 months ago

fakerybakery commented 3 months ago

Hi, I'm trying to finetune OLMo but running into the error ValueError: OLMoForCausalLM does not support gradient checkpointing. Is this planned in the future?

Thanks for releasing OLMo!

2015aroras commented 3 months ago

We just released OLMo integration into the transformers library (v4.40.0 and up), with corresponding -hf checkpoints on Huggingface Hub (e.g. https://huggingface.co/allenai/OLMo-1.7-7B-hf). I haven't tried gradient checkpointing there, but it may work.

bdytx5 commented 2 months ago

I confirmed it does not work. This would a great addition.