Closed DopeorNope-Lee closed 10 months ago
The difference is that in your first config, you're defining two output slices:
slices:
- sources: # output slice #1
- model: AIDC-ai-business/Marcoroni-7B-v3
layer_range: [0, 24]
- sources: # output slice #2
- model: Toten5/Marcoroni-neural-chat-7B-v2
layer_range: [8, 32]
These simply get stacked on top of each other, giving you a final model with 40 layers instead of the 32 that a 7B model has. In your second config, you're defining a single output slice that combines two input slices:
slices:
- sources: # output slice #1
- model: AIDC-ai-business/Marcoroni-7B-v3 # input slice #1
layer_range: [0, 24]
- model: Toten5/Marcoroni-neural-chat-7B-v2 # input slice #2
layer_range: [8, 32]
The two input slices will be combined using the merge method you specified (slerp
here.) That means that layer 0 of AIDC-ai-business/Marcoroni-7B-v3 will be SLERP merged with layer 8 of Toten5/Marcoroni-neural-chat-7B-v2, layer 1 with layer 9, and so on. The end result is the size of your input slices, so just 24 layers.
As for the filter options - filter works by searching for the substring you specify in the tensor name, so it depends on the architecture you're merging. If you want to know all of the tensor names in a Mistral model you can see a list on huggingface here.
Hope this helps!
Option 1
Option 2
I generated two models using different config options.
The first one generated a 10.7B size model, however, the second one generated a 5.5B size model.
I applied the same base models and layer information, but it yielded different results.
Anyone could explain why these different results have occurred?
Moreover, in slerp merge, there are any other options on the parameter (especially in the filter, mlp or self-attn, or others?)?
Thanks!