Does global_train_batch_size support gradient accumulation?

allenai / OLMo

Modeling, training, eval, and inference code for OLMo

https://allenai.org/olmo

Apache License 2.0

4.37k stars 431 forks source link

Does global_train_batch_size support gradient accumulation? #672

Open jinzhuoran opened 1 month ago

jinzhuoran commented 1 month ago

❓ The question

Hello authors, thank you very much for your inspiring work. I now have 8 A100s. If I want to pretrain the model at a certain checkpoint, can I set global_train_batch_size to the original 2048 and then set device_train_microbatch_size to 2? Is this equivalent to using more GPUs?

AkshitaB commented 1 month ago

@jinzhuoran Yes, this should be possible. Have you faced an issue when trying this?