arcee-ai / mergekit

Tools for merging pretrained large language models.
GNU Lesser General Public License v3.0
4.79k stars 435 forks source link

Error at MoE Qwen 1.5B #395

Closed ehristoforu closed 2 months ago

ehristoforu commented 2 months ago
mergekit-moe config.yaml merge --copy-tokenizer --device cuda --low-cpu-memory --trust-remote-code
ERROR:root:No output architecture found that is compatible with the given models.
ERROR:root:All supported output architectures:
ERROR:root:  * Mixtral
ERROR:root:  * DeepSeek MoE
ERROR:root:  * Qwen MoE

I have the latest version of mergekit, I use only Qwen2 models and only 1.5B weight, no custom code.

ZhangJiayuan-BUAA commented 2 months ago

Qwen models must specify one shared expert. I hope this will solve your problem.

mergekit/moe/qwen.py image

ehristoforu commented 2 months ago

Qwen models must specify one shared expert. I hope this will solve your problem.

mergekit/moe/qwen.py image

Sorry, I didn't quite understand the meaning of the words "one shared" expert

Here is my config:

MODEL_NAME = "newmoe-0"
yaml_config = """
base_model: ehristoforu/Qwen2-1.5b-it-chat
gate_mode: hidden
dtype: bfloat16
experts:
- source_model: ehristoforu/Qwen2-1.5b-it-chat
positive_prompts: ["chat", "assistant", "chat history", "chat context", "writing", "text writing", "editing", "text editing", "multilingual"]
- source_model: ehristoforu/Qwen2-1.5b-it-bioinstruct
positive_prompts: ["bio", "science", "biology", "natural sciences", "scientist"]
- source_model: ehristoforu/Qwen2-1.5b-it-codealpaca
positive_prompts: ["code", "coding", "coder", "programming", "programmer", "code analysis", "code review", "code fix", "code improvement"]
- source_model: ehristoforu/Qwen2-1.5b-it-math
positive_prompts: ["math", "mathematician", "problem solving", "calculating", "logics"]

All models in fp16, in PyTorch (.bin) format, pre-trained Qwen2 1.5B without merging.

What exactly do I need to change?

lijinfeng0713 commented 1 month ago

Qwen models must specify one shared expert. I hope this will solve your problem. mergekit/moe/qwen.py image

Sorry, I didn't quite understand the meaning of the words "one shared" expert

Here is my config:

MODEL_NAME = "newmoe-0"
yaml_config = """
base_model: ehristoforu/Qwen2-1.5b-it-chat
gate_mode: hidden
dtype: bfloat16
experts:
- source_model: ehristoforu/Qwen2-1.5b-it-chat
positive_prompts: ["chat", "assistant", "chat history", "chat context", "writing", "text writing", "editing", "text editing", "multilingual"]
- source_model: ehristoforu/Qwen2-1.5b-it-bioinstruct
positive_prompts: ["bio", "science", "biology", "natural sciences", "scientist"]
- source_model: ehristoforu/Qwen2-1.5b-it-codealpaca
positive_prompts: ["code", "coding", "coder", "programming", "programmer", "code analysis", "code review", "code fix", "code improvement"]
- source_model: ehristoforu/Qwen2-1.5b-it-math
positive_prompts: ["math", "mathematician", "problem solving", "calculating", "logics"]

All models in fp16, in PyTorch (.bin) format, pre-trained Qwen2 1.5B without merging.

What exactly do I need to change?

Have you solved your problem? I have the same error when merging qwen2-0.5b