EleutherAI / gpt-neox

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
https://www.eleuther.ai/
Apache License 2.0
6.96k stars 1.02k forks source link

Automatically compute train_iters when train_epochs is specified. #1283

Closed AI-WAIFU closed 1 month ago

AI-WAIFU commented 2 months ago

Major changes:

Note: Most of these changes are a consequence of not being able to compute train_iters when creating the NeoX Args object. At a high level we pass both train_epochs and train_iters down to the dataloader, and use the one that is not none to specify the dataloader behavior, then if train_iters is unspecified we infer it from the dataloader after constructing it.

Quentin-Anthony commented 2 months ago

fixes https://github.com/EleutherAI/gpt-neox/issues/1268