could you tell me the tip of the parameter for merging

Yongtae723 commented 8 months ago

Hi @KyujinHan !

I really thank you for opening up your thoughts and model. I believe your passion and effort must make great progress in developing llm for all engineers!

I would like to know the tips for making a good merging experiment because I am new to merging llm.

I tried to initiate your method to make Japanese model. the strategy is merge openchat/openchat-3.5-1210 and lightblue/karasu-7B

the difficulties are

those models are based on mistralai/Mistral-7B-v0.1 so that I can not use your parameter directly (I guess)
lightblue/karasu-7B use expanded tokenizer, So I used tokenizer_source: union but I don't know if this is a good way or not.

Besides those difficulties, I merged by following yaml for the first time, but the output of the generated model is not as good as the original 2 models.

Can I ask your thoughts?

slices:
  - sources:
      - model: openchat/openchat-3.5-1210
        layer_range: [0, 32]
      - model: lightblue/karasu-7B
        layer_range: [0, 32]
merge_method: slerp
base_model: mistralai/Mistral-7B-v0.1
tokenizer_source: union
parameters:
  t:
    - filter: self_attn
      value: [0, 0.5, 0.3, 0.7, 1]
    - filter: mlp
      value: [1, 0.5, 0.7, 0.3, 0]
    - value: 0.5 
dtype: bfloat16

Yongtae723 commented 8 months ago

Or could you tell me the link which I can understand merging tips!

Thanks!

KyujinHan commented 8 months ago

Hello!

I saw your models below link:

I dont know accurately, but I suggest some ideas:

You can consider another merge method: passthrough
- Above image, easy understanding for you.

How to use? (below is yaml example):

slices:
- sources:
- model: seungduk/KoSOLAR-10.7B-v0.1
  layer_range: [0, 36]
- sources:
- model: seungduk/KoSOLAR-10.7B-v0.1
  layer_range: [12, 48]
merge_method: passthrough
dtype: float16

I suggest,

you apply passthrough method for your model lightblue/karasu-7B and openchat/openchat-3.5-1210.
You can use just one model, also can use both.

Is there another japanese LLM?
- When I checked models, lightblue/karasu-7B vocab size is 120128, but openchat/openchat-3.5-1210 vocab size is 32002.
- I think, it is not proper for merging.

I suggest,

First, openchat/openchat-3.5-1210 vocab expanding, and finetuning with japanese dataset.
then, merging two models with slerp.
Second, lightblue/karasu-7B merge with another japanese LLM.

I dont know, it is a clear answer. I have not tried merging with different vocab sizes, so I don't know if what I suggested would be appropriate. Thanks!😄😄

Yongtae723 commented 8 months ago

Thank you for kind answers!

Your answers make a lot of sense! Again, thank for opening your thoughts and code!you are awesome!

KyujinHan / Sakura-SOLAR-DPO

could you tell me the tip of the parameter for merging #3