Lightning-AI / litgpt

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
https://lightning.ai
Apache License 2.0
10.22k stars 1.01k forks source link

Support a new model #1475

Open takgto opened 3 months ago

takgto commented 3 months ago

Do you have a plan to support JetMoE model (https://github.com/myshell-ai/JetMoE) that very effective to reduce computational cost in inference in litgpt?

rasbt commented 3 months ago

Hi there, thanks for suggesting! New models are always welcome. JetMoE is currently not on the priority list due to many other requests and features to be added, but if you want to contribute it, that'd be welcome!

rasbt commented 3 months ago

I added a doc describing how to add a new model to LitGPT in case this comes in handy: https://github.com/Lightning-AI/litgpt/blob/main/tutorials/developer-docs/adding-models.md

takgto commented 3 months ago

I added a doc describing how to add a new model to LitGPT in case this comes in handy: https://github.com/Lightning-AI/litgpt/blob/main/tutorials/developer-docs/adding-models.md

Thanks so much for your information. It is really valuable for me. Currently, I have a difficulty in updating the checkpoint conversion script (convert_hf_checkpoint.py) for the new model (jetmoe/jetmoe-8b). I think It needs another weight_map in the script. However, I don't find out some keys of the new model as follows. weight_map = { "model.embed_tokens.weight": "transformer.wte.weight", "model.layers.{}.mlp.output_linear.weight": ?, # ? mark means unknown key "model.layers.{}.mlp.router.layer.weight": ?, "model.layers.{}.input_layernorm.weight":"transformer.h.{}.norm_1.weight", "model.layers.{}.mlp.bias": ?, "model.layers.{}.mlp.input_linear.weight": ?, "model.layers.{}.post_attention_layernorm.weight":"transformer.h.{}.norm_2.weight", "model.layers.{}.self_attention.experts.bias": ? , "model.layers.{}.self_attention.experts.input_linear.weight": ? , "model.layers.{}.self_attention.experts.output_linear.weight": ? ,
"model.layers.{}.self_attention.experts.router.layer.weight":"transformer.h.{}.attn.experts.out_proj.weight", "model.layers.{}.self_attention.kv_proj.weight": ? , "model.norm.weight": "transformer.ln_f.weight", "model.layers.{}.self_attention.q_proj.weight":"transformer.h.{}.attn.q_proj.weight", "model.layers.{}.self_attention.k_proj.weight":"transformer.h.{}.attn.k_proj.weight", "model.layers.{}.self_attention.v_proj.weight":"transformer.h.{}.attn.v_proj.weight", } Do you know any tools or documentations to find out those unknown keys?

rasbt commented 3 months ago

That's a good question and usually the tricky part. It can be pretty hard to find the corresponding layer some times due to naming convention and sometimes because it may not be supported yet. I think in this case the LlamaMoE might be a good template to look at:

https://github.com/Lightning-AI/litgpt/blob/e2f8074b32ce08852f933636d1d81689990e1771/litgpt/scripts/convert_hf_checkpoint.py#L138

rasbt commented 3 months ago

I haven't read the JetMoE paper, do they also have different attention experts? In this case, this would not be supported yet. The LlamaMoE is only for the MLP layers as in the Mixtral.

takgto commented 3 months ago

Thank you for your continued support. According to the technical website of jetmoe ( https://research.myshell.ai/jetmoe ), jetmoe has two MoE layers: Mixture of Attention heads (MoA) and Mixture of MLP exports (MoE) looks like ModuleFormer ( https://arxiv.org/abs/2306.04640 ). So, LlamaMoE model might not be fit to jetmoe. Separately, I am asking the jetmoe website to provide parameter mapping information ( https://github.com/myshell-ai/JetMoE/issues/11 ). Unfortunately, I haven't received a reply yet.

rasbt commented 3 months ago

Oh I see, the Mixture of Attention heads (MoA) part will be a bit tricky then, that's currently not supported by LitGPT and would have to be coded. It might be a bit tricky for a contribution like this