arcee-ai / mergekit

Tools for merging pretrained large language models.
GNU Lesser General Public License v3.0
4.88k stars 446 forks source link

Better tied weight handling #464

Closed cg123 closed 1 day ago

cg123 commented 1 day ago

Handle cases where some input models have a tied tensor and some don't.

For example, there are some fine tunes of Llama 3.2 3B floating around that are ~3.6B parameters because they have a separate LM head - with these changes these can be merged with standard sized ones. There will be a LM head in the output model if any inputs have one. Otherwise behavior will be as it was before.