Why two different options generate different size of models?

arcee-ai / mergekit

Tools for merging pretrained large language models.

GNU Lesser General Public License v3.0

4.83k stars 439 forks source link

slices: - sources: - model: AIDC-ai-business/Marcoroni-7B-v3 layer_range: [0, 24] - sources: - model: Toten5/Marcoroni-neural-chat-7B-v2 layer_range: [8, 32] merge_method: passthrough dtype: bfloat16

slices: - sources: - model: AIDC-ai-business/Marcoroni-7B-v3 layer_range: [0, 24] - model: Toten5/Marcoroni-neural-chat-7B-v2 layer_range: [8, 32] merge_method: slerp base_model: AIDC-ai-business/Marcoroni-7B-v3 parameters: t: - filter: self_attn value: [0, 0.5, 0.3, 0.7, 1] - filter: mlp value: [1, 0.5, 0.7, 0.3, 0] - value: 0.5 dtype: float16

The difference is that in your first config, you're defining two output slices:

slices:
  - sources: # output slice #1
    - model: AIDC-ai-business/Marcoroni-7B-v3
      layer_range: [0, 24]
  - sources: # output slice #2
    - model: Toten5/Marcoroni-neural-chat-7B-v2
      layer_range: [8, 32]

These simply get stacked on top of each other, giving you a final model with 40 layers instead of the 32 that a 7B model has. In your second config, you're defining a single output slice that combines two input slices:

slices:
  - sources: # output slice #1
      - model: AIDC-ai-business/Marcoroni-7B-v3 # input slice #1
        layer_range: [0, 24]
      - model: Toten5/Marcoroni-neural-chat-7B-v2 # input slice #2
        layer_range: [8, 32]

The two input slices will be combined using the merge method you specified (slerp here.) That means that layer 0 of AIDC-ai-business/Marcoroni-7B-v3 will be SLERP merged with layer 8 of Toten5/Marcoroni-neural-chat-7B-v2, layer 1 with layer 9, and so on. The end result is the size of your input slices, so just 24 layers.

As for the filter options - filter works by searching for the substring you specify in the tensor name, so it depends on the architecture you're merging. If you want to know all of the tensor names in a Mistral model you can see a list on huggingface here.

Hope this helps!

arcee-ai / mergekit

Why two different options generate different size of models? #49