bigscience-workshop / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2
Other
1.33k stars 215 forks source link

No-ZeRO reshaping #289

Open Muennighoff opened 2 years ago

Muennighoff commented 2 years ago

Should be merged first: https://github.com/bigscience-workshop/Megatron-DeepSpeed/pull/239 Only adds tools/convert_checkpoint/deepspeed_to_deepspeed_nozero.py

Our small models are trained without ZeRO. This script enables reshaping of them.

Tests:

Notes: