Closed AkimfromParis closed 7 months ago
Hi! Glad you're finding it useful.
I'd recommend using the chat template for the models you've merged, if they share a common one. If not then the model that has a higher weight in the lower layers will often get better responses.
Hope this helps!
Thank you for your help!
I got impressive results on merging Mistral LLMs more than LLama 2 LLMs. I was slightly confused Mistral using "tokenizer_class": "LlamaTokenizer". And the chat template was deleted from tokenizer_config.json by most of the companies leveraging Mistral... But then why not... 😃
Thanks again! Akim
Hello,
Thank you for your work on the MergeKit. 🔥
Today, I created 6 merge models to test bilingual LLMs (Japanese/English) only using the SLERP method. It's 2 LLMs based on Llama (Heliotrope), 2 LLMs based on Mistral (Neroli), and 2 LLMs based on Mistral Japanese only (Hinoki). I will benchmark those Japanese LLMs later today.
🌱 AkimfromParis/Heliotrope-Ely-Swa-slerp-7B
🍋 AkimfromParis/Neroli-Rak-Lig-slerp-7B
🌲 AkimfromParis/Hinoki-Sak-Sta-slerp-7B
I have used the LazyMergeKit of Maxime that produces the following chat template.
Yes, it's not in Japanese, I will translate it. 😃 Should I keep this chat_template? Or should I reverse back to Llama and Mistral with special tokens?
Thank you in advance!
Akim