The following article discusses merging a Japanese VLM with an English LLM.
In the article it states that they did the following:
Evolving the Weights for Mixing Parameters in the Parameter Space (PS): This step involves adjusting the weights of the parameters in the models to optimize performance.
Evolving Layer Permutations in the Data Flow Space (DFS): This involves rearranging the layers of the models in a way that optimizes the flow of data and potentially enhances model performance.
Integrated Strategy that Combines Both PS and DFS Merging: This final step merges the strategies from both the parameter space and the data flow space. This is not a simple copying or stitching of layers or parameters but involves a blending of weights and configurations, akin to mixing colors (as illustrated by the transition from red and blue to purple in the diagram)
The following article discusses merging a Japanese VLM with an English LLM.
In the article it states that they did the following:
Evolutionary Optimization of Model Merging Recipes
Does Mergekit have similar algorithms to merge say BAAI/Bunny-Llama-3-8B-V with say meta-llama/Meta-Llama-3-70B-Instruct