arcee-ai / mergekit

Tools for merging pretrained large language models.
GNU Lesser General Public License v3.0
4.88k stars 446 forks source link

Awesome repo, but can it convert multiple architectures to Llama? #371

Open BBC-Esq opened 4 months ago

BBC-Esq commented 4 months ago

I came across your HF repo and located this script and seem to have successfully converted the model here:

https://huggingface.co/internlm/internlm2_5-7b-chat

The purpose of converting to Llama is to make it compatible with Ctranslate2, which currently does not support the Internlm architecture.

However, when trying to convert this model to Ctranslate2 I received an error stating:

NotImplementedError: RoPE scaling type 'dynamic' is not yet implemented. The following RoPE scaling types are currently supported: linear, su Thus, it appears that Ctranslate2 doesn't support dynamic yet.

Is there a way to either modify the ROPE scaling technique in the converted model and/or remove it altogether?

Secondly, do you have any more scripts that can convert a multitude of architectures to Llama? I'd love to convert multiple model architectures to Llama solely to use them with Ctranslate2. Perhaps mergekit can do this already but I didn't see the convert_weights.py script anywhere on this repository...only on HF...nor did I see anything in the Readme on how to simply convert a model's weights to a different architecture? Your help would be much appreciated! Starred this repo.