Closed Yongtae723 closed 8 months ago
Or could you tell me the link which I can understand merging tips!
Thanks!
Hello!
I saw your models below link:
I dont know accurately, but I suggest some ideas:
slices:
- sources:
- model: seungduk/KoSOLAR-10.7B-v0.1
layer_range: [0, 36]
- sources:
- model: seungduk/KoSOLAR-10.7B-v0.1
layer_range: [12, 48]
merge_method: passthrough
dtype: float16
I suggest,
passthrough
method for your model lightblue/karasu-7B
and openchat/openchat-3.5-1210
.lightblue/karasu-7B
vocab size is 120128, but openchat/openchat-3.5-1210
vocab size is 32002.I suggest,
openchat/openchat-3.5-1210
vocab expanding, and finetuning with japanese dataset.lightblue/karasu-7B
merge with another japanese LLM. I dont know, it is a clear answer. I have not tried merging with different vocab sizes, so I don't know if what I suggested would be appropriate. Thanks!😄😄
Thank you for kind answers!
Your answers make a lot of sense! Again, thank for opening your thoughts and code!you are awesome!
Hi @KyujinHan !
I really thank you for opening up your thoughts and model. I believe your passion and effort must make great progress in developing llm for all engineers!
I would like to know the tips for making a good merging experiment because I am new to merging llm.
I tried to initiate your method to make Japanese model. the strategy is merge
openchat/openchat-3.5-1210
andlightblue/karasu-7B
the difficulties are
mistralai/Mistral-7B-v0.1
so that I can not use your parameter directly (I guess)lightblue/karasu-7B
use expanded tokenizer, So I usedtokenizer_source: union
but I don't know if this is a good way or not.Besides those difficulties, I merged by following yaml for the first time, but the output of the generated model is not as good as the original 2 models.
Can I ask your thoughts?