Merging fails with RuntimeError: weight required but not present in model

arcee-ai / mergekit

Tools for merging pretrained large language models.

GNU Lesser General Public License v3.0

4.8k stars 437 forks source link

Merging fails with RuntimeError: weight required but not present in model #284

Open w601sxs opened 6 months ago

w601sxs commented 6 months ago

I'm trying to merge some embedding models with this config file. the architectures are similar but I think it is erroring out on some names of layers? Would love some suggestions on how to change the yaml to make it work.

YAML config:

models:
  - model: mixedbread-ai/mxbai-embed-large-v1
  - model: BAAI/bge-large-en-v1.5
    parameters:
      density: [0, 0.25, 0.5, 0.75, 1]
      weight: [0, 0.25, 0.5, 0.75, 1]
  - model: avsolatorio/GIST-large-Embedding-v0
    parameters:
      density: [0, 0.25, 0.5, 0.75, 1]
      weight: [0, 0.25, 0.5, 0.75, 1]
  - model: WhereIsAI/UAE-Large-V1
    parameters:
      density: [0, 0.25, 0.5, 0.75, 1]
      weight: [0, 0.25, 0.5, 0.75, 1]
merge_method: dare_ties
base_model: mixedbread-ai/mxbai-embed-large-v1
parameters:
  int8_mask: true
dtype: bfloat16

Error

RuntimeError: Tensor bert.encoder.layer.23.output.LayerNorm.weight required but not present in model WhereIsAI/UAE-Large-V1

CLI used

!mergekit-yaml merge.yaml ./output --cuda

w601sxs commented 6 months ago

maybe an example of how to frankenmerge with passthrough?

metric-space commented 6 months ago

Hey there, thank you for the detailed issue. This is definitely a bug

As of now for a quick quick fix for this to make it work on your end is to go here mergekit/_data/architectures/bert.json and replace all instances of bert. with an empty string

and that should hopefully get you going with your current config

That said, we will be putting a bug fix soon

w601sxs commented 6 months ago

I'll try that in a local branch and wait for the fix! thanks

cg123 commented 6 months ago

Thanks for the bug report! PR #295 should fix this issue. If you run into any further trouble please let me know - the BERT support is quite fresh and I appreciate knowing where it fails.

yaof20 commented 6 months ago

Hi Charles! Thanks for the great work!

I am encountering similar issues.

I am using phi-1 and phi-1.5 models, the config yml file is as follows.

dtype: float16
merge_method: passthrough
slices:
- sources:
  - layer_range: [0, 8]
    model: microsoft/phi-1
- sources:
  - layer_range: [4, 12]
    model: microsoft/phi-1
- sources:
  - layer_range: [8, 16]
    model: microsoft/phi-1
- sources:
  - layer_range: [12, 20]
    model: microsoft/phi-1
- sources:
  - layer_range: [16, 24]
    model: microsoft/phi-1
- sources:
  - layer_range: [20, 28]
    model: microsoft/phi-1
- sources:
  - layer_range: [24, 32]
    model: microsoft/phi-1

Both phi-1 and phi-1.5 give me the following feedback. (I also tried Tinyllama, it also gave me the same issue)

RuntimeError: Tensor model.layers.31.mlp.fc2.weight required but not present in model microsoft/phi-1_5

In addition, how can I run the same yml config for phi-3 model, whose architecture is currently not included in the package?

Thanks! @cg123

cg123 commented 6 months ago

@yaof20 This is because microsoft/phi-1 only has 24 layers, but you're telling mergekit to look for 32 total. If you adjust your config to only use 0-24 instead it should work properly.

As for Phi-3 - I'll add support for it in the next couple of days!

yaof20 commented 6 months ago

@yaof20 This is because microsoft/phi-1 only has 24 layers, but you're telling mergekit to look for 32 total. If you adjust your config to only use 0-24 instead it should work properly.

As for Phi-3 - I'll add support for it in the next couple of days!

Thanks for the reply!