arcee-ai / mergekit

Tools for merging pretrained large language models.
GNU Lesser General Public License v3.0
4.83k stars 439 forks source link

Mangled results when merging two 7B models into a smaller model #24

Closed fakerybakery closed 11 months ago

fakerybakery commented 11 months ago

Hi, I'm using this config

slices:
  - sources:
    - model: HuggingFaceH4/zephyr-7b-beta
      layer_range: [0, 12]
  - sources:
    - model: argilla/notus-7b-v1
      layer_range: [28, 32]
merge_method: passthrough
dtype: bfloat16

And when I run this code:

from transformers import pipeline
import torch
pipe = pipeline('text-generation', model='minihermes', device='mps')
pipe('Python is a programming language')

I get:

[{'generated_text': 'Python is a programming language, <a\n luego without.\n\n\n\n\n\n\n\n'}]

I feel like I'm doing something wrong here... Is this an issue w/ the tokenizer?

fakerybakery commented 11 months ago

My goal is to make a ~3-4B model. Do you know if this is possible?

cg123 commented 11 months ago

Unfortunately this is pretty expected. LLMs seem to be incredibly robust to duplicating layers but actually removing them tends to destroy coherence very quickly.

It's possible to prune down models like this but mergekit isn't the best tool for it. I'd recommend looking at https://github.com/princeton-nlp/LLM-Shearing and seeing if it can work for what you want.

Hope this helps!

fakerybakery commented 11 months ago

Hmm, makes sense. Thanks for the suggestions!