arcee-ai / mergekit

Tools for merging pretrained large language models.
GNU Lesser General Public License v3.0
4.81k stars 439 forks source link

Sincerely.. I have no words...without insulting someone. #353

Open 0wwafa opened 4 months ago

0wwafa commented 4 months ago

Considering I have metered internet and not so great resources, I followed your guind and the notebook. I used this yaml:

slices:
  - sources:
      - model: mistralai/Mistral-7B-Instruct-v0.3
        layer_range: [0, 32]
        density: 0.5
        weight: 0.3
      - model: lucyknada/microsoft_WizardLM-2-7B
        layer_range: [0, 32]
        density: 0.5
        weight: 0.3
merge_method: ties
base_model: mistralai/Mistral-7B-Instruct-v0.3
parameters:
  t:
    - filter: self_attn
      value: [0, 0.5, 0.3, 0.7, 1]
    - filter: mlp
      value: [1, 0.5, 0.7, 0.3, 0]
    - value: 0.5

dtype: bfloat16

after more than an hour on colab I uploaded the merged model to huggingface. https://huggingface.co/ZeroWw/ZeroWw-mwiz-7B-slerp

Then I quantized it and downloaded (painfully, from where I am).

The result:

llama_model_load: error loading model: check_tensor_dims: tensor 'token_embd.weight' has wrong shape; expected  4096, 32768, got  4096, 32000,     1,     1
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model 'ZeroWw-mwiz-7B-slerp.f16.q6.gguf'
main: error: unable to load model

Meh.

johnwee1 commented 4 months ago

thats because you are trying to merge models with different underlying base models. wizardlm2 is not a finetune of mistral v0.3, it's a finetune of v0.1, so its not surprising that it doesn't work since they are different base models.

see #324

0wwafa commented 4 months ago

well.. there should be a flag or soething to make that possible...