arcee-ai / mergekit

Tools for merging pretrained large language models.
GNU Lesser General Public License v3.0
4.57k stars 406 forks source link

Need some help in merging same architectures, but with different tokens in their tokenizers #342

Open choprahetarth opened 3 months ago

choprahetarth commented 3 months ago

Hello! I actually have two models - CodeLLaMa-13b-Python and CodeLLaMa-13b, that need to be merged. The overall goal is to merge two models (one trained on Python and another trained on any other language). However, the biggest problem that I am facing is this -

Traceback (most recent call last):
  File "/u/choprahetarth/all_files/model_merging/merger.py", line 22, in <module>
    run_merge(
  File "/u/choprahetarth/all_files/model_merging/mergekit/mergekit/merge.py", line 92, in run_merge
    for _task, value in exec.run(quiet=options.quiet):
  File "/u/choprahetarth/all_files/model_merging/mergekit/mergekit/graph.py", line 197, in run
    res = task.execute(**arguments)
  File "/u/choprahetarth/all_files/model_merging/mergekit/mergekit/merge_methods/tokenizer_permute.py", line 85, in execute
    expanded = torch.stack(expanded, dim=0)
RuntimeError: stack expects each tensor to be equal size, but got [32000, 5120] at entry 0 and [32016, 5120] at entry 1

Now the YAML file I have used to merge looks like this -

models:
  - model : meta-llama/CodeLlama-13b-Python-hf
    parameters:
      density: 0.5 # density gradient
      weight:
        - filter: embed_tokens
          value: 0
        - value: 1
  - model: meta-llama/CodeLlama-13b-hf
    parameters:
      density: 0.5 # density gradient
      weight:
        - filter: embed_tokens
          value: 0
        - value: 1
tokenizer_source: union
merge_method: dare_ties
base_model: meta-llama/CodeLlama-13b-hf
parameters:
  density: 0.5 # density gradient
  weight:
    - filter: embed_tokens
      value: 0
    - value: 1
  normalize: true
  int8_mask: true
dtype: float32

As far as I can see the problem arises when I merge the two models since their total number of tokens (and leading to the embedding layer size) is different, regardless of the other layers being same.

Model 1 (python) - 
Layer: embed_tokens.weight | Size: torch.Size([32016, 5120])

Base Model - 
Layer: embed_tokens.weight | Size: torch.Size([32000, 5120])

How do I make sure that I IGNORE this layer, and keep everything else as is?

Also, somehow the model-stock method does work well. Just not sure why.

Other than the slight difference in the embedding layer size (which likely corresponds to a different vocabulary/token size), all the other layer dimensions are exactly the same between the two models:

choprahetarth commented 3 months ago
  - sources:
    - model : meta-llama/CodeLlama-13b-Python-hf
      layer_range: [2, 39]
    - model: meta-llama/CodeLlama-13b-hf
      layer_range: [2, 39]
tokenizer_source: union
merge_method: slerp
base_model: meta-llama/CodeLlama-13b-hf
layer_range: [2, 39]
parameters:
  t:
    - filter: self_attn
      value: [0, 0.5, 0.3, 0.7, 1]
    - filter: mlp
      value: [1, 0.5, 0.7, 0.3, 0]
    - value: 0.5 # fallback for rest of tensors
  normalize: true
  int8_mask: true
dtype: float32 

Also, I have tried this sort of configuration as well. Got the same results.

cg123 commented 3 months ago

Could you please try this merge using the branch from #334? I believe it should fix this.

choprahetarth commented 3 months ago

Thank you so much Charles (and for the amazing library as well!). However, I am getting this particular error after checking out on the tokenizer_again branch -

Executing graph:   0%|          | 0/1820 [00:00<?, ?it/s]WARNING:root:Token '▁<EOT>' present in meta-llama/CodeLlama-13b-Python-hf tokenizer but >= vocab_size
WARNING:root:Token '▁<MID>' present in meta-llama/CodeLlama-13b-Python-hf tokenizer but >= vocab_size
WARNING:root:Token '▁<PRE>' present in meta-llama/CodeLlama-13b-Python-hf tokenizer but >= vocab_size
WARNING:root:Token '▁<SUF>' present in meta-llama/CodeLlama-13b-Python-hf tokenizer but >= vocab_size

Building tokenizer permutations:   0%|          | 0/2 [00:00<?, ?it/s]WARNING:root:meta-llama/CodeLlama-13b-Python-hf token '▁<EOT>' has index 32003>31999 (padding?)
WARNING:root:meta-llama/CodeLlama-13b-Python-hf token '▁<PRE>' has index 32000>31999 (padding?)
WARNING:root:meta-llama/CodeLlama-13b-Python-hf token '▁<SUF>' has index 32002>31999 (padding?)
WARNING:root:meta-llama/CodeLlama-13b-Python-hf token '▁<MID>' has index 32001>31999 (padding?)

Building tokenizer permutations: 100%|██████████| 2/2 [00:00<00:00, 12.86it/s]
Building tokenizer permutations: 100%|██████████| 2/2 [00:00<00:00, 12.85it/s]

Executing graph:   0%|          | 2/1820 [00:01<20:18,  1.49it/s]
Executing graph:   0%|          | 3/1820 [00:05<1:07:27,  2.23s/it]
Executing graph:   0%|          | 4/1820 [00:09<1:20:57,  2.67s/it]
Executing graph:   0%|          | 5/1820 [00:09<55:35,  1.84s/it]  
Traceback (most recent call last):
  File "/u/choprahetarth/all_files/model_merging/merger.py", line 22, in <module>
    run_merge(
  File "/u/choprahetarth/all_files/model_merging/mergekit/mergekit/merge.py", line 95, in run_merge
    for _task, value in exec.run(quiet=options.quiet):
  File "/u/choprahetarth/all_files/model_merging/mergekit/mergekit/graph.py", line 197, in run
    res = task.execute(**arguments)
  File "/u/choprahetarth/all_files/model_merging/mergekit/mergekit/tokenizer/embed.py", line 63, in execute
    tokens_to_average = self.assign_embedding_sources(
  File "/u/choprahetarth/all_files/model_merging/mergekit/mergekit/tokenizer/embed.py", line 127, in assign_embedding_sources
    has_token = [p[token_id] >= 0 for p in permutation_list]
  File "/u/choprahetarth/all_files/model_merging/mergekit/mergekit/tokenizer/embed.py", line 127, in <listcomp>
    has_token = [p[token_id] >= 0 for p in permutation_list]
KeyError: 32010
srun: error: gpub002: task 0: Exited with exit code 1

The config used is ->

models:
  - model : meta-llama/CodeLlama-13b-Python-hf
    parameters:
      density: 0.5 # density gradient
      weight:
        - filter: embed_tokens
          value: 0
        - value: 1
  - model: meta-llama/CodeLlama-13b-hf
    parameters:
      density: 0.5 # density gradient
      weight:
        - filter: embed_tokens
          value: 0
        - value: 1
tokenizer_source: union
merge_method: dare_ties
base_model: meta-llama/CodeLlama-13b-hf
parameters:
  density: 0.5 # density gradient
  weight:
    - filter: embed_tokens
      value: 0
    - value: 1
  normalize: true
  int8_mask: true
dtype: float32
choprahetarth commented 3 months ago

Okay so only possible workaround that I have (somehow) been able to use is to manually reshape the model's embedding layer with huggingface ->

import transformers
from transformers import LlamaForCausalLM

import torch

model = "meta-llama/CodeLlama-13b-hf"
print("================================FIRST MODEL STARTS HERE=================================")
tokenizer_normal = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)
# Load the model
model_normal = LlamaForCausalLM.from_pretrained(model)

# Print the size of all layers in the model
for name, param in model_normal.named_parameters():
    print(f"Layer: {name} | Size: {param.size()}")

print("============================SECOND MODEL STARTS HERE====================================")

model = "meta-llama/CodeLlama-13b-Python-hf"
tokenizer_python = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)
# Load the model
model_python = LlamaForCausalLM.from_pretrained(model)
model_python.resize_token_embeddings(len(tokenizer_normal))

# Print the size of all layers in the model
for name, param in model_python.named_parameters():
    print(f"Layer: {name} | Size: {param.size()}")
print(model_python.config)
# Check if the model's architecture is LlamaForCausalLM before pushing
if model_python.config.architectures[0] == "LlamaForCausalLM":
    # Push the second model to Hugging Face Hub
    model_python.push_to_hub("codellama-13b-hf-truncated-embeddings")
else:
    print("The model's architecture is not LlamaForCausalLM. Not pushing to hub.")
# Push the second model to Hugging Face Hub
# model_python.push_to_hub("codellama-13b-hf-truncated-embeddings")

and then use this ->

models:
  - model : choprahetarth/codellama-13b-hf-truncated-embeddings
    parameters:
      density: 0.5 # density gradient
      weight:
        - value: 1
  - model: meta-llama/CodeLlama-13b-hf
    parameters:
      density: 0.5 # density gradient
      weight:
        - value: 1
tokenizer_source: union
merge_method: dare_ties
base_model: meta-llama/CodeLlama-13b-hf
parameters:
  t:
    - filter: self_attn
      value: [0, 0.5, 0.3, 0.7, 1]
    - filter: mlp
      value: [1, 0.5, 0.7, 0.3, 0]
    # - filter: embed_tokens
    #   value: 0 # use lm_head from 8b_stage2_final
    - value: 0.5
  embed_slerp: true 
  normalize: true
  int8_mask: true
dtype: float16

to merge them together.

@cg123 I was wondering where exactly should I add this in my code (within mergekit, and possibly make a PR/branch as well). The library is wayyy too complex for me to wrap my head around without documentation (as much as I respect you for writing it, I mean, it is amazing!). Just a small direction on where I could add this as a contribution would be nice!