How to convert a model parallel model to hugging face model?

EleutherAI / gpt-neox

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries

https://www.eleuther.ai/

Apache License 2.0

6.81k stars 988 forks source link

How to convert a model parallel model to hugging face model? #880

Closed guozhiyao closed 1 year ago

guozhiyao commented 1 year ago

Is your feature request related to a problem? Please describe. I train a model with "model-parallel-size": 2, and try to convert it to hugging-face model. I refer to tools/convert_to_hf.py for conversion, and the model can load parameters normally, but the generated results are random. While I train a model with "model-parallel-size": 1, and use the same code to convert and generate, the result is normal. So I suspect it is because of the conversion code.

Describe the solution you'd like A clear and concise description of what you want to happen.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

haileyschoelkopf commented 1 year ago

Hi! I can look into this--you should be able to convert this to Huggingface without issues regardless of model-parallel size with our current scripts, so this is surprising and concerning.

What conversion script are you running, and what commit of this repository and DeepSpeed version are you using?

guozhiyao commented 1 year ago

The commit is 7d682df, the deepspeed version is 0.7.5, and I use the tools/convert_to_hf.py. Besides, I use the rmsnorm, how to merge the parameter of it?

haileyschoelkopf commented 1 year ago

Hi! Thanks for sharing. A couple things:

If you're using Deepspeed 0.7.5, you should try running tools/convert_sequential_to_hf.py off of the current main branch! That conversion script has some changes applied to work for the v2.0 of our library onwards.
If you are using RMSNorm, it's very surprising to me that your model converts properly to Huggingface format when MP=1, since HF doesn't support RMSNorm in GPTNeoXModel to my knowledge. I believe you'd have to write a custom version of the Huggingface code to support RMSNorm first, and then make sure to update the script to port RMSNorm properly even with MP>1.

guozhiyao commented 1 year ago

I modify the huggingface code to support rmsnorm in gptneox. I use the pp instead of mp to train the model. And load the parameter into huggingface model refer to tools/merge20b.py of 7d682df, and the inference is normal.

haileyschoelkopf commented 1 year ago

I see--I think I'm a bit confused with what cases work and do not work for you, and what code you're using for conversion If I'm understanding correctly you're experiencing the following:

When you train with PP>1 and MP=1, using a copy of tools/merge20b.py you've edited in your fork you can get the same outputs in your HF fork as in NeoX.
When you're training with MP>1, using tools/convert_to_hf.py does not work.

Could you try the following?

Update your repository copy to the latest commit
Try converting your trained model (PP=1 and MP>1 version) using tools/convert_sequential_to_hf.py with whatever RMSNorm modifications you added to tools/merge20b.py on your end
See if performance is equivalent between HF and NeoX?

For your model with PP>1, you'll want to try using (an edited RMSNorm version of) tools/convert_v1.0_to_hf.py I believe.

StellaAthena commented 1 year ago

@guozhiyao Hey, following up on this.

Quentin-Anthony commented 1 year ago

Closing this due to inactivity. Feel free to reopen if you'd like to continue investigating @guozhiyao