What does this PR do?

We can now specify the number of batches per rank per dataloader directly in the config. Whenever we don't want to iterate over the entire dataloader, e.g., for benchmarking or smaller models, we can specify fixed_num_batches in the dataloader. Additionally, in the settings, we can add global_num_train_tokens, allowing the specify the global number of training tokens after which the model stops the training. We have implemented a conversion routine, which calculates fixed_num_batches from global_num_train_tokens.

General Changes

see above

Breaking Changes

None

Checklist before submitting final PR

[x] My PR is minimal and addresses one issue in isolation
[x] I have merged the latest version of the target branch into this feature branch
[x] I have reviewed my own code w.r.t. correct implementation, missing type hints, proper documentation, etc.
[x] I have run a sample config for model training
[x] I have checked that all tests run through (python tests/tests.py)

Modalities / modalities

Dataloader with fixed size #180

What does this PR do?

General Changes

Breaking Changes

Checklist before submitting final PR