arcee-ai / mergekit

Tools for merging pretrained large language models.
GNU Lesser General Public License v3.0
4.87k stars 445 forks source link

Tring to add Bloom architecture but something goes wrong #131

Open scarydemon2 opened 10 months ago

scarydemon2 commented 10 months ago

I first add BLOOM_INFO in architecture.py :

BLOOM_INFO = StaticTensorNames(
    name="BloomForCausalLM",
    pre_weight_names=["word_embeddings.weight"],
    post_weight_names=["ln_f.weight", "ln_f.bias"],
    embed_weight_names=["word_embeddings.weight"],
    layer_prefix_format="h.{idx}",
    layer_weight_suffixes=[
        "input_layernorm.weight",
        "input_layernorm.bias",
        "self_attention.query_key_value.weight",
        "self_attention.query_key_value.bias",
        "self_attention.dense.weight",
        "self_attention.dense.bias",
        "post_attention_layernorm.weight",
        "post_attention_layernorm.bias",
        "mlp.dense_h_to_4h.weight",
        "mlp.dense_h_to_4h.bias",
        "mlp.dense_4h_to_h.weight",
        "mlp.dense_4h_to_h.bias",
    ],
)

And after I run :

models:
  - model: path1
    parameters:
      density: [1, 0.7, 0.1] # density gradient
      weight: 1.0
  - model:path2
    parameters:
      density: 0.5
      weight: [0, 0.3, 0.7, 1] # weight gradient
merge_method: ties
base_model: base_path
parameters:
  normalize: true
  int8_mask: true
dtype: float16

But there is an error occured: Traceback (most recent call last): File "/opt/conda/bin/mergekit-yaml", line 8, in sys.exit(main()) File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1157, in call return self.main(args, kwargs) File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 783, in invoke return __callback(args, *kwargs) File "/home/gaotianhao1/alpaca-lora/mergekit/mergekit/options.py", line 59, in wrapper f(args, **kwargs) File "/home/gaotianhao1/alpaca-lora/mergekit/mergekit/scripts/run_yaml.py", line 47, in main run_merge( File "/home/gaotianhao1/alpaca-lora/mergekit/mergekit/merge.py", line 110, in run_merge exec.run( File "/home/gaotianhao1/alpaca-lora/mergekit/mergekit/graph.py", line 261, in run for ref, tensor in tqdm.tqdm(self.generate_tensors(), total=len(self.targets)): File "/opt/conda/lib/python3.8/site-packages/tqdm/std.py", line 1182, in iter for obj in iterable: File "/home/gaotianhao1/alpaca-lora/mergekit/mergekit/graph.py", line 280, in generate_tensors schedule = self._schedule_ops() File "/home/gaotianhao1/alpaca-lora/mergekit/mergekit/graph.py", line 376, in _schedule_ops dependencies, ops = self._build_dependencies() File "/home/gaotianhao1/alpaca-lora/mergekit/mergekit/graph.py", line 416, in _build_dependencies _visit(target) File "/home/gaotianhao1/alpaca-lora/mergekit/mergekit/graph.py", line 413, in _visit _visit(dependency) File "/home/gaotianhao1/alpaca-lora/mergekit/mergekit/graph.py", line 405, in _visit raise RuntimeError(f"No rule to produce {node}") No rule to produce path1:word_embeddings.weight

So I wanna know the right process to add a BLOOM architecture. Expecting for your help.

cg123 commented 10 months ago

It looks like BloomForCausalLM has a similar situation to many GPT-family models where model checkpoints can, but do not always, have a prefix ('transformer.) on tensor names. For example, see projecte-aina/FLOR-6.3B and bigscience/bloom-7b1 - FLOR has the embedding named transformer.word_embeddings.weight and base bloom has it named just word_embeddings.weight.

I am working on a more general fix for this sort of situation but right now mergekit expects an architecture to have one consistent naming format. In the meanwhile, I'd try just loading and re-serializing the models like so:

import transformers, torch
model = transformers.AutoModelForCausalLM.from_pretrained("your/model", dtype=torch.float16)
model.save_pretrained("/workspace/model_reexport", safe_serialization=True)

If you do this to each model they should all be written with consistent names. I know this is hacky and inconvenient - sorry about that!

ymoslem commented 9 months ago

Hi @scarydemon2 May I ask if you managed to solve this. I am trying to do something similar. Thanks!