We can now specify the number of batches per rank per dataloader directly in the config. Whenever we don't want to iterate over the entire dataloader, e.g., for benchmarking or smaller models, we can specify fixed_num_batches in the dataloader. Additionally, in the settings, we can add global_num_train_tokens, allowing the specify the global number of training tokens after which the model stops the training. We have implemented a conversion routine, which calculates fixed_num_batches from global_num_train_tokens.
General Changes
see above
Breaking Changes
None
Checklist before submitting final PR
[x] My PR is minimal and addresses one issue in isolation
[x] I have merged the latest version of the target branch into this feature branch
[x] I have reviewed my own code w.r.t. correct implementation, missing type hints, proper documentation, etc.
[x] I have run a sample config for model training
[x] I have checked that all tests run through (python tests/tests.py)
What does this PR do?
We can now specify the number of batches per rank per dataloader directly in the config. Whenever we don't want to iterate over the entire dataloader, e.g., for benchmarking or smaller models, we can specify
fixed_num_batches
in the dataloader. Additionally, in the settings, we can addglobal_num_train_tokens
, allowing the specify the global number of training tokens after which the model stops the training. We have implemented a conversion routine, which calculatesfixed_num_batches
fromglobal_num_train_tokens
.General Changes
Breaking Changes
Checklist before submitting final PR
python tests/tests.py
)