Closed ehristoforu closed 2 months ago
Qwen models must specify one shared expert. I hope this will solve your problem.
mergekit/moe/qwen.py
Qwen models must specify one shared expert. I hope this will solve your problem.
mergekit/moe/qwen.py
Sorry, I didn't quite understand the meaning of the words "one shared" expert
Here is my config:
MODEL_NAME = "newmoe-0"
yaml_config = """
base_model: ehristoforu/Qwen2-1.5b-it-chat
gate_mode: hidden
dtype: bfloat16
experts:
- source_model: ehristoforu/Qwen2-1.5b-it-chat
positive_prompts: ["chat", "assistant", "chat history", "chat context", "writing", "text writing", "editing", "text editing", "multilingual"]
- source_model: ehristoforu/Qwen2-1.5b-it-bioinstruct
positive_prompts: ["bio", "science", "biology", "natural sciences", "scientist"]
- source_model: ehristoforu/Qwen2-1.5b-it-codealpaca
positive_prompts: ["code", "coding", "coder", "programming", "programmer", "code analysis", "code review", "code fix", "code improvement"]
- source_model: ehristoforu/Qwen2-1.5b-it-math
positive_prompts: ["math", "mathematician", "problem solving", "calculating", "logics"]
All models in fp16, in PyTorch (.bin) format, pre-trained Qwen2 1.5B without merging.
What exactly do I need to change?
Qwen models must specify one shared expert. I hope this will solve your problem. mergekit/moe/qwen.py
Sorry, I didn't quite understand the meaning of the words "one shared" expert
Here is my config:
MODEL_NAME = "newmoe-0" yaml_config = """ base_model: ehristoforu/Qwen2-1.5b-it-chat gate_mode: hidden dtype: bfloat16 experts: - source_model: ehristoforu/Qwen2-1.5b-it-chat positive_prompts: ["chat", "assistant", "chat history", "chat context", "writing", "text writing", "editing", "text editing", "multilingual"] - source_model: ehristoforu/Qwen2-1.5b-it-bioinstruct positive_prompts: ["bio", "science", "biology", "natural sciences", "scientist"] - source_model: ehristoforu/Qwen2-1.5b-it-codealpaca positive_prompts: ["code", "coding", "coder", "programming", "programmer", "code analysis", "code review", "code fix", "code improvement"] - source_model: ehristoforu/Qwen2-1.5b-it-math positive_prompts: ["math", "mathematician", "problem solving", "calculating", "logics"]
All models in fp16, in PyTorch (.bin) format, pre-trained Qwen2 1.5B without merging.
What exactly do I need to change?
Have you solved your problem? I have the same error when merging qwen2-0.5b
I have the latest version of mergekit, I use only Qwen2 models and only 1.5B weight, no custom code.