Should be merged first: https://github.com/bigscience-workshop/Megatron-DeepSpeed/pull/239
Only adds tools/convert_checkpoint/deepspeed_to_deepspeed_nozero.py
Our small models are trained without ZeRO. This script enables reshaping of them.
Tests:
Loss continues where it left off after reshaping from
PP=4, TP=4 -> PP=2, TP=2 👍
PP=4, TP=4 -> PP=1, TP=1 👍
PP=2, TP=1 -> PP=1, TP=1 👍
Checkpoint size stays the same 👍
Notes:
I'm not doing any black formatting etc, as this is not a production codebase - Let me know if that's not okay & the code should be cleaner!
Should be merged first:
https://github.com/bigscience-workshop/Megatron-DeepSpeed/pull/239
Only addstools/convert_checkpoint/deepspeed_to_deepspeed_nozero.py
Our small models are trained without ZeRO. This script enables reshaping of them.
Tests:
Notes: