arcee-ai / mergekit

Tools for merging pretrained large language models.
GNU Lesser General Public License v3.0
4.8k stars 437 forks source link

Japanese LLMs merge - Chat template #235

Closed AkimfromParis closed 7 months ago

AkimfromParis commented 7 months ago

Hello,

Thank you for your work on the MergeKit. 🔥

Today, I created 6 merge models to test bilingual LLMs (Japanese/English) only using the SLERP method. It's 2 LLMs based on Llama (Heliotrope), 2 LLMs based on Mistral (Neroli), and 2 LLMs based on Mistral Japanese only (Hinoki). I will benchmark those Japanese LLMs later today.

🌱 AkimfromParis/Heliotrope-Ely-Swa-slerp-7B

🍋 AkimfromParis/Neroli-Rak-Lig-slerp-7B

🌲 AkimfromParis/Hinoki-Sak-Sta-slerp-7B

I have used the LazyMergeKit of Maxime that produces the following chat template.

messages = [{"role": "user", "content": "What is a large language model?"}]. 

Yes, it's not in Japanese, I will translate it. 😃 Should I keep this chat_template? Or should I reverse back to Llama and Mistral with special tokens?

Thank you in advance!

Akim

cg123 commented 7 months ago

Hi! Glad you're finding it useful.

I'd recommend using the chat template for the models you've merged, if they share a common one. If not then the model that has a higher weight in the lower layers will often get better responses.

Hope this helps!

AkimfromParis commented 7 months ago

Thank you for your help!

I got impressive results on merging Mistral LLMs more than LLama 2 LLMs. I was slightly confused Mistral using "tokenizer_class": "LlamaTokenizer". And the chat template was deleted from tokenizer_config.json by most of the companies leveraging Mistral... But then why not... 😃

Thanks again! Akim