bigscience-workshop / xmtf

Crosslingual Generalization through Multitask Finetuning
https://arxiv.org/abs/2211.01786
Apache License 2.0
516 stars 37 forks source link

How to convert megatron-deepspeed checkpoints to huggingface checkpoints ? #11

Closed huybery closed 1 year ago

huybery commented 1 year ago

I try to use https://github.com/bigscience-workshop/Megatron-DeepSpeed/blob/main/tools/convert_checkpoint/deepspeed_to_transformers.py to convert checkpoints, but i meet this issue https://github.com/bigscience-workshop/Megatron-DeepSpeed/issues/355, could u help me resolve it ? thanks !

Muennighoff commented 1 year ago

If it's a bloom model you can use this: https://github.com/huggingface/transformers/blob/main/src/transformers/models/bloom/convert_bloom_original_checkpoint_to_pytorch.py

Else I have been using the below command with below repo sucessfully https://github.com/TurkuNLP/Megatron-DeepSpeed python Megatron-DeepSpeed/tools/convert_checkpoint/deepspeed_to_transformers.py --input_folder checkpoints_2b855b55bc4ul2ndfixnew/global_step52452 --output_folder lm5-2b8-55b-c4/transformers --target_tp 1 --target_pp 1

huybery commented 1 year ago

@Muennighoff Thanks for you reply! Could you tell me the version of Transformers?

Muennighoff commented 1 year ago

I'm using the latest one i.e. 4.26.1

huybery commented 1 year ago

thanks ! the first method is successful, but the second method is also failed :( I encounter a ValueError: too many values to unpack (expected 2)