bigscience-workshop / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2
Other
1.3k stars 211 forks source link

deepspeed_to_megatron several issues #355

Open MatejUlcar opened 1 year ago

MatejUlcar commented 1 year ago
  1. A recent commit removed tools/convert_checkpoint/deepspeed_checkpoint.py but there is still an attempt to import it in tools/convert_checkpoint/deepspeed_to_megatron.py. The other scripts in the folder appear to be ok. I guess the import on the line 7 should be changed from from .deepspeed_checkpoint import ARGS_KEY, DeepSpeedCheckpoint to from deepspeed.checkpoint.deepspeed_checkpoint import ARGS_KEY, DeepSpeedCheckpoint?

  2. https://github.com/microsoft/Megatron-DeepSpeed/issues/91 applies here as well.

  3. The function _renest_sd defined on line 90, splits the keys, but I encounter a ValueError: too many values to unpack (expected 2) as one of the keys is named word_embeddings.norm.weight, containing two dots. There are probably more such layer names. I'm attempting to convert my own model bf16_zero (pp=1, tp=1) checkpoints to megatron and/or hf format. Entirely possible this is all my own wrongdoing. If so, please advise, how to successfully convert.

huybery commented 1 year ago

same question.

Rosenberg37 commented 1 year ago

same question.

yzzzwd commented 1 year ago

same question.

pizts commented 1 week ago

same question.