arcee-ai / mergekit

Tools for merging pretrained large language models.
GNU Lesser General Public License v3.0
4.88k stars 446 forks source link

Example case of task_arithmetic needed #392

Open Opdoop opened 3 months ago

Opdoop commented 3 months ago

Parameter setting in examples is too simple. It's really hard to follow how to set parameters for different methods. For example, task_arthmetic is missing. How to merge different layers with different weights in task_arthmetic.

jukofyork commented 3 months ago

If it's any help then I the way I did it was to use task_arthmetic and then afterwards linear to filter the layers / tensor names:

https://huggingface.co/jukofyork/miquplus-midnight-70b

name: _miquplus-midnight-70b
merge_method: task_arithmetic
parameters:
  normalize : false
  weight: 1
models:
  - model: meta-llama/Llama-2-70b-hf
  - model: 152334H/miqu-1-70b-sf
  - model: sophosympatheia/Midnight-Rose-70B-v2.0.3
base_model: meta-llama/Llama-2-70b-hf
dtype: float16
---
name: miquplus-midnight-70b
merge_method: linear
models:
  - model: 152334H/miqu-1-70b-sf
    parameters:
      weight:
        - filter: v_proj
          value: [1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1]
        - filter: o_proj
          value: [1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1]
        - filter: up_proj
          value: [1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1]
        - filter: gate_proj
          value: [1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1]
        - filter: down_proj
          value: [1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1]
        - value: 1
  - model: _miquplus-midnight-70b
    parameters:
      weight:
        - filter: v_proj
          value: [0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0]
        - filter: o_proj
          value: [0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0]
        - filter: up_proj
          value: [0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0]
        - filter: gate_proj
          value: [0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0]
        - filter: down_proj
          value: [0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0]
        - value: 0
base_model: 152334H/miqu-1-70b-sf
tokenizer_source: base
dtype: float16