Previously, the logging was performed on all local ranks == 0, leading to excessive logging in multi-node training. We fixed this issue and log now only on a single rank, i.e., global rank == 0.
General Changes
see above
Breaking Changes
The config now expects global ranks for the loggers now. See the changes her.
Checklist before submitting final PR
[x] My PR is minimal and addresses one issue in isolation
[x] I have merged the latest version of the target branch into this feature branch
[x] I have reviewed my own code w.r.t. correct implementation, missing type hints, proper documentation, etc.
[x] I have run a sample config for model training
[x] I have checked that all tests run through (python tests/tests.py)
What does this PR do?
Previously, the logging was performed on all local ranks == 0, leading to excessive logging in multi-node training. We fixed this issue and log now only on a single rank, i.e., global rank == 0.
General Changes
Breaking Changes
Checklist before submitting final PR
python tests/tests.py
)