arcee-ai / mergekit

Tools for merging pretrained large language models.
GNU Lesser General Public License v3.0
4.71k stars 429 forks source link

Small-into-Large Merging: Trying to figure out an approach #285

Open jaggzh opened 6 months ago

jaggzh commented 6 months ago

I'm wondering about the possibility of fine-tuning a small model (limited resources for large), upscaling (in the "best" way possible), and merging with a large model. The purpose would be because inference has less overhead and can run on systems where training such a model is not possible. But to incorporate knowledge into such a model.. maybe... Has this been done? What are the approaches it would take?

shamanez commented 6 months ago

@jaggzh Great question! We're actively exploring this and open to collaborations. So far, there aren't any concise papers discussing merging different architectures. It's worth noting that this specific problem might require adjusting hidden sizes and layer numbers.

However, there's one notable work from DeepMind called "Compsistion Augment LLMs," where they utilize cross-attention to merge models, but it requires training. https://arxiv.org/abs/2401.02412

If focusing solely on merging, a current approach could involve initially pruning an LLM, like reducing the parameters in Mistral. Then, train that model and apply Data Flow Space Evolutionary merging to merge the pruned model with a larger one. This could be simpler since we wouldn't have to deal with varying hidden sizes. Great question! We're actively exploring this and open to collaborations. So far, there aren't any concise papers discussing merging different architectures. It's worth noting that this specific problem might require adjusting hidden sizes and layer numbers.