arcee-ai / mergekit

Tools for merging pretrained large language models.
GNU Lesser General Public License v3.0
4.88k stars 446 forks source link

passthrough merge error: Tensor model.layers.86.self_attn.k_norm.weight required but not present in model mistralai/Mistral-Large-Instruct-2407 #398

Closed AshD closed 3 months ago

AshD commented 3 months ago

Mergekit (8/18/24) : Trying to create a passthrough merge and it fails with this error RuntimeError: Tensor model.layers.86.self_attn.k_norm.weight required but not present in model mistralai/Mistral-Large-Instruct-2407

mergekit-config is

dtype: bfloat
merge_method: passthrough
slices:
- sources:
  - layer_range: [0, 30]
    model: mistralai/Mistral-Large-Instruct-2407
- sources:
  - layer_range: [5, 35]
    model: mistralai/Mistral-Large-Instruct-2407
- sources:
  - layer_range: [11, 31]
    model: mistralai/Mistral-Large-Instruct-2407
- sources:
  - layer_range: [15, 35]
    model: mistralai/Mistral-Large-Instruct-2407
- sources:
  - layer_range: [22, 42]
    model: mistralai/Mistral-Large-Instruct-2407
- sources:
  - layer_range: [25, 45]
    model: mistralai/Mistral-Large-Instruct-2407
- sources:
  - layer_range: [33, 53]
    model: mistralai/Mistral-Large-Instruct-2407
- sources:
  - layer_range: [40, 80]
    model: mistralai/Mistral-Large-Instruct-2407
- sources:
  - layer_range: [44, 87]
    model: mistralai/Mistral-Large-Instruct-2407

Output

mergekit-yaml ./mergekit_config_mistral.yml ./models --cuda --allow-crimes --lazy-unpickle
Fetching 110 files: 100%|█████████████████████████████████████████████████████████| 110/110 [00:00<00:00, 202801.51it/s]
Warmup loader cache: 100%|████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  2.54it/s]
Executing graph:   0%|                                                                | 1/8990 [00:00<00:15, 578.60it/s]
Traceback (most recent call last):
  File "/home/ash/miniconda3/envs/py310/bin/mergekit-yaml", line 8, in <module>
    sys.exit(main())
  File "/home/ash/miniconda3/envs/py310/lib/python3.9/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/home/ash/miniconda3/envs/py310/lib/python3.9/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/home/ash/miniconda3/envs/py310/lib/python3.9/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ash/miniconda3/envs/py310/lib/python3.9/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/home/ash/ai/mergekit/mergekit/options.py", line 82, in wrapper
    f(*args, **kwargs)
  File "/home/ash/ai/mergekit/mergekit/scripts/run_yaml.py", line 47, in main
    run_merge(
  File "/home/ash/ai/mergekit/mergekit/merge.py", line 96, in run_merge
    for _task, value in exec.run(quiet=options.quiet):
  File "/home/ash/ai/mergekit/mergekit/graph.py", line 197, in run
    res = task.execute(**arguments)
  File "/home/ash/ai/mergekit/mergekit/io/tasks.py", line 86, in execute
    raise RuntimeError(
RuntimeError: Tensor model.layers.86.self_attn.k_norm.weight required but not present in model mistralai/Mistral-Large-Instruct-2407
metric-space commented 3 months ago

Perhaps I am gravely mistaken, but is there any chance the mistral json file that defines the mistral architecture has been modified on your end? self_attn.k_norm.weight in addition to being an odd weight name doesn't exist here https://github.com/arcee-ai/mergekit/blob/main/mergekit/_data/architectures/mistral.json

AshD commented 3 months ago

Thanks. That was it.