Runtime per step linearly increases with training step number.

Describe the bug Training step time is takes linearly more time with each training step.

To Reproduce Steps to reproduce the behavior: My config for running training was: config.yml.txt Used slurm for launching. Script run by sbatch was: launch.sh.txt

Expected behavior Runtime per train step linearly increases based on train step. Plot of optimizer step time in ms, based on iteration [data parsed from training logs]: Plot of forward step time in ms, based on iteration: Plot of samples per second, based on iteration:

Environment: Running code on the LUMI supercomputer. GPUs: Single node run with 8x AMD MI250X GPUs. Pip list: pip-list.txt Python version: 3.6.15 Also modified deeperspeed to wrap each launched training process in a singularity container with srun.

Additional info: Full training log: slurm3.txt

Any ideas what might be the issue?

EleutherAI / gpt-neox

Runtime per step linearly increases with training step number. #1322