Open Opdoop opened 3 months ago
If it's any help then I the way I did it was to use task_arthmetic
and then afterwards linear
to filter the layers / tensor names:
https://huggingface.co/jukofyork/miquplus-midnight-70b
name: _miquplus-midnight-70b
merge_method: task_arithmetic
parameters:
normalize : false
weight: 1
models:
- model: meta-llama/Llama-2-70b-hf
- model: 152334H/miqu-1-70b-sf
- model: sophosympatheia/Midnight-Rose-70B-v2.0.3
base_model: meta-llama/Llama-2-70b-hf
dtype: float16
---
name: miquplus-midnight-70b
merge_method: linear
models:
- model: 152334H/miqu-1-70b-sf
parameters:
weight:
- filter: v_proj
value: [1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1]
- filter: o_proj
value: [1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1]
- filter: up_proj
value: [1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1]
- filter: gate_proj
value: [1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1]
- filter: down_proj
value: [1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1]
- value: 1
- model: _miquplus-midnight-70b
parameters:
weight:
- filter: v_proj
value: [0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0]
- filter: o_proj
value: [0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0]
- filter: up_proj
value: [0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0]
- filter: gate_proj
value: [0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0]
- filter: down_proj
value: [0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0]
- value: 0
base_model: 152334H/miqu-1-70b-sf
tokenizer_source: base
dtype: float16
Parameter setting in examples is too simple. It's really hard to follow how to set parameters for different methods. For example,
task_arthmetic
is missing. How to merge different layers with different weights intask_arthmetic
.