Open choprahetarth opened 3 months ago
- sources:
- model : meta-llama/CodeLlama-13b-Python-hf
layer_range: [2, 39]
- model: meta-llama/CodeLlama-13b-hf
layer_range: [2, 39]
tokenizer_source: union
merge_method: slerp
base_model: meta-llama/CodeLlama-13b-hf
layer_range: [2, 39]
parameters:
t:
- filter: self_attn
value: [0, 0.5, 0.3, 0.7, 1]
- filter: mlp
value: [1, 0.5, 0.7, 0.3, 0]
- value: 0.5 # fallback for rest of tensors
normalize: true
int8_mask: true
dtype: float32
Also, I have tried this sort of configuration as well. Got the same results.
Could you please try this merge using the branch from #334? I believe it should fix this.
Thank you so much Charles (and for the amazing library as well!). However, I am getting this particular error after checking out on the tokenizer_again branch -
Executing graph: 0%| | 0/1820 [00:00<?, ?it/s]WARNING:root:Token '▁<EOT>' present in meta-llama/CodeLlama-13b-Python-hf tokenizer but >= vocab_size
WARNING:root:Token '▁<MID>' present in meta-llama/CodeLlama-13b-Python-hf tokenizer but >= vocab_size
WARNING:root:Token '▁<PRE>' present in meta-llama/CodeLlama-13b-Python-hf tokenizer but >= vocab_size
WARNING:root:Token '▁<SUF>' present in meta-llama/CodeLlama-13b-Python-hf tokenizer but >= vocab_size
Building tokenizer permutations: 0%| | 0/2 [00:00<?, ?it/s][AWARNING:root:meta-llama/CodeLlama-13b-Python-hf token '▁<EOT>' has index 32003>31999 (padding?)
WARNING:root:meta-llama/CodeLlama-13b-Python-hf token '▁<PRE>' has index 32000>31999 (padding?)
WARNING:root:meta-llama/CodeLlama-13b-Python-hf token '▁<SUF>' has index 32002>31999 (padding?)
WARNING:root:meta-llama/CodeLlama-13b-Python-hf token '▁<MID>' has index 32001>31999 (padding?)
Building tokenizer permutations: 100%|██████████| 2/2 [00:00<00:00, 12.86it/s][A
Building tokenizer permutations: 100%|██████████| 2/2 [00:00<00:00, 12.85it/s]
Executing graph: 0%| | 2/1820 [00:01<20:18, 1.49it/s]
Executing graph: 0%| | 3/1820 [00:05<1:07:27, 2.23s/it]
Executing graph: 0%| | 4/1820 [00:09<1:20:57, 2.67s/it]
Executing graph: 0%| | 5/1820 [00:09<55:35, 1.84s/it]
Traceback (most recent call last):
File "/u/choprahetarth/all_files/model_merging/merger.py", line 22, in <module>
run_merge(
File "/u/choprahetarth/all_files/model_merging/mergekit/mergekit/merge.py", line 95, in run_merge
for _task, value in exec.run(quiet=options.quiet):
File "/u/choprahetarth/all_files/model_merging/mergekit/mergekit/graph.py", line 197, in run
res = task.execute(**arguments)
File "/u/choprahetarth/all_files/model_merging/mergekit/mergekit/tokenizer/embed.py", line 63, in execute
tokens_to_average = self.assign_embedding_sources(
File "/u/choprahetarth/all_files/model_merging/mergekit/mergekit/tokenizer/embed.py", line 127, in assign_embedding_sources
has_token = [p[token_id] >= 0 for p in permutation_list]
File "/u/choprahetarth/all_files/model_merging/mergekit/mergekit/tokenizer/embed.py", line 127, in <listcomp>
has_token = [p[token_id] >= 0 for p in permutation_list]
KeyError: 32010
srun: error: gpub002: task 0: Exited with exit code 1
The config used is ->
models:
- model : meta-llama/CodeLlama-13b-Python-hf
parameters:
density: 0.5 # density gradient
weight:
- filter: embed_tokens
value: 0
- value: 1
- model: meta-llama/CodeLlama-13b-hf
parameters:
density: 0.5 # density gradient
weight:
- filter: embed_tokens
value: 0
- value: 1
tokenizer_source: union
merge_method: dare_ties
base_model: meta-llama/CodeLlama-13b-hf
parameters:
density: 0.5 # density gradient
weight:
- filter: embed_tokens
value: 0
- value: 1
normalize: true
int8_mask: true
dtype: float32
Okay so only possible workaround that I have (somehow) been able to use is to manually reshape the model's embedding layer with huggingface ->
import transformers
from transformers import LlamaForCausalLM
import torch
model = "meta-llama/CodeLlama-13b-hf"
print("================================FIRST MODEL STARTS HERE=================================")
tokenizer_normal = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
"text-generation",
model=model,
torch_dtype=torch.float16,
device_map="auto",
)
# Load the model
model_normal = LlamaForCausalLM.from_pretrained(model)
# Print the size of all layers in the model
for name, param in model_normal.named_parameters():
print(f"Layer: {name} | Size: {param.size()}")
print("============================SECOND MODEL STARTS HERE====================================")
model = "meta-llama/CodeLlama-13b-Python-hf"
tokenizer_python = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
"text-generation",
model=model,
torch_dtype=torch.float16,
device_map="auto",
)
# Load the model
model_python = LlamaForCausalLM.from_pretrained(model)
model_python.resize_token_embeddings(len(tokenizer_normal))
# Print the size of all layers in the model
for name, param in model_python.named_parameters():
print(f"Layer: {name} | Size: {param.size()}")
print(model_python.config)
# Check if the model's architecture is LlamaForCausalLM before pushing
if model_python.config.architectures[0] == "LlamaForCausalLM":
# Push the second model to Hugging Face Hub
model_python.push_to_hub("codellama-13b-hf-truncated-embeddings")
else:
print("The model's architecture is not LlamaForCausalLM. Not pushing to hub.")
# Push the second model to Hugging Face Hub
# model_python.push_to_hub("codellama-13b-hf-truncated-embeddings")
and then use this ->
models:
- model : choprahetarth/codellama-13b-hf-truncated-embeddings
parameters:
density: 0.5 # density gradient
weight:
- value: 1
- model: meta-llama/CodeLlama-13b-hf
parameters:
density: 0.5 # density gradient
weight:
- value: 1
tokenizer_source: union
merge_method: dare_ties
base_model: meta-llama/CodeLlama-13b-hf
parameters:
t:
- filter: self_attn
value: [0, 0.5, 0.3, 0.7, 1]
- filter: mlp
value: [1, 0.5, 0.7, 0.3, 0]
# - filter: embed_tokens
# value: 0 # use lm_head from 8b_stage2_final
- value: 0.5
embed_slerp: true
normalize: true
int8_mask: true
dtype: float16
to merge them together.
@cg123 I was wondering where exactly should I add this in my code (within mergekit, and possibly make a PR/branch as well). The library is wayyy too complex for me to wrap my head around without documentation (as much as I respect you for writing it, I mean, it is amazing!). Just a small direction on where I could add this as a contribution would be nice!
Hello! I actually have two models - CodeLLaMa-13b-Python and CodeLLaMa-13b, that need to be merged. The overall goal is to merge two models (one trained on Python and another trained on any other language). However, the biggest problem that I am facing is this -
Now the YAML file I have used to merge looks like this -
As far as I can see the problem arises when I merge the two models since their total number of tokens (and leading to the embedding layer size) is different, regardless of the other layers being same.
How do I make sure that I IGNORE this layer, and keep everything else as is?
Also, somehow the model-stock method does work well. Just not sure why.
Other than the slight difference in the embedding layer size (which likely corresponds to a different vocabulary/token size), all the other layer dimensions are exactly the same between the two models: