bigscience-workshop / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2
Other
1.31k stars 213 forks source link

how to convert huggingface model to megatron-deepspeed? #329

Closed yayaQAQ closed 1 year ago

yayaQAQ commented 2 years ago

as title said.

mayank31398 commented 2 years ago

This is not possible. To download DS checkpoints refer to this issue: https://github.com/bigscience-workshop/Megatron-DeepSpeed/issues/319

yayaQAQ commented 2 years ago

This is not possible. To download DS checkpoints refer to this issue: #319

why? So I have to training from the ground up? It's hard.

mayank31398 commented 2 years ago

I don't understand the issue. Do you just need to run inference? If that is the case, DS-inference is compatible with all Huggingface models.

AnShengqiang commented 2 years ago

I don't understand the issue. Do you just need to run inference? If that is the case, DS-inference is compatible with all Huggingface models.

Hello, I have the same problem.

I want to load the model on Huggingface as a pre-training model weight and continue the training using the Megatron Deepspeed framework.

But I found that I didn't know how to convert the weight of Huggingface into the weight of Megatron Deepspeed.

I look forward to your help. Thank you.

AnShengqiang commented 2 years ago

By the way:

model structure: gpt model link: https://huggingface.co/TsinghuaAI/CPM-Generate

I want to train the model with 4 pipeline parallel and deepspeed.

mayank31398 commented 2 years ago

@AnShengqiang Its non-trivial to convert models for training. People are actively exploring this as far as I know. This repository saves something called a universal checkpoint which can be converted to other checkpoints. However, I am quite new here so, I don't really know how that works.

AnShengqiang commented 2 years ago

@AnShengqiang Its non-trivial to convert models for training. People are actively exploring this as far as I know. This repository saves something called a universal checkpoint which can be converted to other checkpoints. However, I am quite new here so, I don't really know how that works.

Thank you for your reply, I will go to find the answer, if there is good news, I will put it here.

stgzr commented 1 year ago

I don't understand the issue. Do you just need to run inference? If that is the case, DS-inference is compatible with all Huggingface models.

Hello, I have the same problem.

I want to load the model on Huggingface as a pre-training model weight and continue the training using the Megatron Deepspeed framework.

But I found that I didn't know how to convert the weight of Huggingface into the weight of Megatron Deepspeed.

I look forward to your help. Thank you.

Same problem. Any tools can do this?