arcee-ai / mergekit

Tools for merging pretrained large language models.
GNU Lesser General Public License v3.0
4.69k stars 427 forks source link

About mergekit behavior by tokenizer #169

Open tomgm777 opened 8 months ago

tomgm777 commented 8 months ago

Hello. I used mergekit to merge various models, but in the case below, the merge process completes normally without any errors, but when I play it back with text-generation-webui, it outputs an incomprehensible answer.

https://huggingface.co/cyberagent/calm2-7b (tokenizer.json use) https://huggingface.co/TheTravellingEngineer/llama2-7b-chat-hf-dpo (tokenizer.model use)

Below is my yaml.

models:
  - model: models/calm2-7b
    # no parameters necessary for base model
  - model: models/llama2-7b-chat-hf-dpo # follow user intent
    parameters:
      density: 1
      weight: 0.4
merge_method: dare_ties
base_model: models/calm2-7b
parameters:
  normalize: true
  int8_mask: true
dtype: bfloat16
tokenizer_source: base

command option --cuda --allow-crimes --trust-remote-code

I tried it with other models, but this problem occurs when the base model is a model that uses tokenizer.json and the model to be merged uses only tokenizer.model. Also, tokenizer_source: union If I specify , mergekit processing seems to freeze.

Do you know of any countermeasures regarding this matter?

tomgm777 commented 8 months ago

P.S I have already made the following modifications. https://github.com/arcee-ai/mergekit/issues/139#issuecomment-1925187686

cg123 commented 8 months ago

Thanks for reporting this, I'll see if I can replicate it and figure out what's going on. It might be because calm2-7b seems to be using a different tokenizer class than base llama2. I hadn't tried that particular combination of things yet.

tomgm777 commented 7 months ago

Hello. The same phenomenon was confirmed with the 70B model. (Since I checked after quantization, there is a possibility that quantization failed...) base model https://huggingface.co/karakuri-ai/karakuri-lm-70b-v0.1 merge model https://huggingface.co/NeverSleep/MiquMaid-v2-70B-DPO With this other merge model https://huggingface.co/sophosympatheia/Midnight-Miqu-70B-v1.0 https://huggingface.co/sophosympatheia/Midnight-Rose-70B-v2.0.3 https://huggingface.co/152334H/miqu-1-70b-sf All of the above responses will be incomprehensible or silent. However, only https://huggingface.co/ChuckMcSneed/Gembo-v1-70b The merge with is successful and the response is normal. I think they are all merges of the same type, but what is the difference? yaml is the same as mentioned above.

cg123 commented 7 months ago

I think you're actually seeing a different issue with most of those merges. Miqu and its derivatives are not directly compatible with Llama 2 based models - they use a different rope_theta value. So similar to merging a CodeLlama based model with base Llama, it's expected that merging them will result in unintelligible output.

Midnight Rose should work though - it's odd if that is having the same problem.

tomgm777 commented 7 months ago

sorry. There was a mistake in midnightroze's merge settings. After fixing it, I was able to respond normally.

sophosympatheia commented 7 months ago

sorry. There was a mistake in midnightroze's merge settings. After fixing it, I was able to respond normally.

What was the mistake that you fixed?

tomgm777 commented 7 months ago

Wow! Are you the author of the Midnight series? Midnight Rose and Karakuri Merge gave great results. thank you!

It was an embarrassing mistake to load midnight-miqu instead of midnight-rose. (><) The midnight-rose merge worked correctly as well as gembo70B.

However, what I was concerned about was dreamgen/opus-v1.2-70b The problem is that the results were good even though the values of rope_theta were different like miqu. (Need to set tokenizer_source: base. Union resulted in an error)