arcee-ai / mergekit

Tools for merging pretrained large language models.
GNU Lesser General Public License v3.0
4.14k stars 359 forks source link

Can we use algorithms to automatically optimize the merging of the weights and layers of the model along the most efficient path? #38

Open win10ogod opened 7 months ago

win10ogod commented 7 months ago

Can we use algorithms to automatically optimize the merging of the weights and layers of the model along the most efficient path?

shamanez commented 7 months ago

@win10ogod I also have a similar question. When we are using the pass-through method, is there any logical way where we can select layers from each model? Can we use something like task arithmetic values to pick the most useful layers?

vince62s commented 6 months ago

I am pretty sure you are asking the same question as what I am looking at (if not sorry to hijack this post). I read the paper Model soup mentioned by @cg123 here https://arxiv.org/pdf/2203.05482.pdf When reading the section 4, we see they try to compare "soups" and "Ensembling" If am not mistaken, my understanding is that Soups is well suited for models sharing the same initialization weights (seed) otherwise models take a completely different path and averaging weights is either irrelevant OR require a post training (finetuning) that may or may not be beneficial. On the other hand, Ensembling is suited for different models since it acts at logits level hence taking the best path mentioned in the title. Ensembling is de facto superior to "Soup" (as they refer in the paper). So the question is, do other methods than Linear emulate better ensembling for models that do not share the same initialization.

Am I correct ?