arcee-ai / mergekit

Tools for merging pretrained large language models.
GNU Lesser General Public License v3.0
4.81k stars 439 forks source link

KeyError: 'model.embed_tokens.weight' when using mergekit-moe #97

Open axrwl opened 10 months ago

axrwl commented 10 months ago

I am getting the following error when I run mergekit-moe config.yml ./output

File "/home/ubuntu/moe_experiments/mergekit/mergekit/options.py", line 58, in wrapper
    f(*args, **kwargs)
  File "/home/ubuntu//moe_experiments/mergekit/mergekit/scripts/mixtral_moe.py", line 394, in main
    build(
  File "/home/ubuntu//moe_experiments/mergekit/mergekit/scripts/mixtral_moe.py", line 277, in build
    tensor = base_loader.get_tensor(tensor_name)
  File "/home/ubuntu//moe_experiments/mergekit/mergekit/io/lazy_tensor_loader.py", line 127, in get_tensor
    raise KeyError(key)
KeyError: 'model.embed_tokens.weight'

My config file is

base_model: cognitivecomputations/dolphin-2_6-phi-2
gate_mode: cheap_embed
experts:
  - source_model: cognitivecomputations/dolphin-2_6-phi-2
    positive_prompts: [""]
  - source_model: lxuechen/phi-2-dpo
    positive_prompts: [""]
  - source_model: Yhyu13/phi-2-sft-dpo-gpt4_en-ep1
    positive_prompts: [""]
  - source_model: mrm8488/phi-2-coder
    positive_prompts: [""]

except the prompt arrays are not empty. I am on the mixtral branch.

Xingxiangrui commented 10 months ago

I met the same problem when I try to merge Deepseek llama model into Mixtral. https://huggingface.co/deepseek-ai/deepseek-llm-7b-base/tree/main It seems that some tensor key_name are not supported in merge-kit. We can load the tensor and modify the tensor key, It Might work. I wiill give a try later.

naseerfaheem commented 10 months ago

@cg123 I have the same issue using 6 different phi-2 models. It would be great it let me know which one of the 6 is having issues.

cg123 commented 10 months ago

Support for phi-based models actually hasn't been added to mergekit-moe yet - I believe @mlabonne used his own customized fork. The mainline mergekit-moe currently only supports Llama and Mistral models. Sorry for the trouble!

ZhangEnmao commented 10 months ago

I met the same problem when I try to merge Deepseek llama model into Mixtral. https://huggingface.co/deepseek-ai/deepseek-llm-7b-base/tree/main It seems that some tensor key_name are not supported in merge-kit. We can load the tensor and modify the tensor key, It Might work. I wiill give a try later.

hello, have you already succeeded in the experiment ?

v-prgmr commented 5 months ago

@ZhangEnmao @naseerfaheem @axrwl @Xingxiangrui I was finally able to merge two phi-2 experts, if you are still looking to use the mergekit for this, checkout the phi2xtral branch here: https://github.com/v-prgmr/mergekit/tree/phi2xtral