Reorders events in the pretrain function, changing from: model/optimizer creation -> data iterator creation to: data loader creation -> model/optimizer creation -> data iterator creation
Removes save_iters property from NeoX args and replaces it with a runtime is_save_iter function
Adds support for non integer checkpoint factors when using logarithmic checkpointing
Note: Most of these changes are a consequence of not being able to compute train_iters when creating the NeoX Args object. At a high level we pass both train_epochs and train_iters down to the dataloader, and use the one that is not none to specify the dataloader behavior, then if train_iters is unspecified we infer it from the dataloader after constructing it.
Major changes:
Note: Most of these changes are a consequence of not being able to compute train_iters when creating the NeoX Args object. At a high level we pass both train_epochs and train_iters down to the dataloader, and use the one that is not none to specify the dataloader behavior, then if train_iters is unspecified we infer it from the dataloader after constructing it.