Small-into-Large Merging: Trying to figure out an approach

@jaggzh Great question! We're actively exploring this and open to collaborations. So far, there aren't any concise papers discussing merging different architectures. It's worth noting that this specific problem might require adjusting hidden sizes and layer numbers.

However, there's one notable work from DeepMind called "Compsistion Augment LLMs," where they utilize cross-attention to merge models, but it requires training. https://arxiv.org/abs/2401.02412

If focusing solely on merging, a current approach could involve initially pruning an LLM, like reducing the parameters in Mistral. Then, train that model and apply Data Flow Space Evolutionary merging to merge the pruned model with a larger one. This could be simpler since we wouldn't have to deal with varying hidden sizes. Great question! We're actively exploring this and open to collaborations. So far, there aren't any concise papers discussing merging different architectures. It's worth noting that this specific problem might require adjusting hidden sizes and layer numbers.

arcee-ai / mergekit

Small-into-Large Merging: Trying to figure out an approach #285