[request] Support for Vision Language Models

NickGao96 commented 1 month ago

Thanks for the great work. I understand at this moment the codebase is primarily for merging LLMs. However, VLMs such LLaVA could also be benefited from model merging. Since a VLM usually comprise submodules with different architectures (a transformer based Vision tower, a bridge, an LLM etc), I cannot find a way to merge a VLM with current codebase in one go and have to separately merge different blocks.

Is there any plan on supporting merging of such VLMs? I see an active branch https://github.com/arcee-ai/mergekit/tree/multi-module-architecture @cg123 that seems to be about similar issue, but since the codebase is quite complicated, I cannot determine what actually this branch does. If it is indeed intended for this use case, can anyone please provide an example on how it really works?

meiyiyeshi commented 1 month ago

I want to add a new schema to it, but I don't know how to make json, how do you make a new qwen2.json? How do you merge a visual language model with a large language model? For example, the merger of MiniCPM-V_2.6 and Qwen2

NickGao96 commented 1 month ago

I want to add a new schema to it, but I don't know how to make json, how do you make a new qwen2.json? How do you merge a visual language model with a large language model? For example, the merger of MiniCPM-V_2.6 and Qwen2

@meiyiyeshi To merge new models, just add a new model.json under _data/architectures. To merge vision language models, there's a new branch VLM, I have tested it on my models and it worked.

meiyiyeshi commented 1 month ago

First of all, thank you very much for taking the time to answer my question. How to make the model.json of the VLM model, I'm stuck here now, the json I made myself has no effect, or does it show that the minicpmv model is not supported.

meiyiyeshi commented 1 month ago

@NickGao96 Do you have a sample of the merger of VLM and LLM, and I would like to learn further if I can

NickGao96 commented 4 weeks ago

@NickGao96 Do you have a sample of the merger of VLM and LLM, and I would like to learn further if I can

It seems they are not moving forward with this VLM branch, but it looks like a good temporary fix. Check this out: https://github.com/arcee-ai/mergekit/blob/VLM/mergekit/_data/architectures/qwen2_vl.json

meiyiyeshi commented 4 weeks ago

Thank you very much and have a great day

@NickGao96

ElliotStein commented 4 weeks ago

@NickGao96 @meiyiyeshi You're right about the VLM branch, we got that working for Qwen VL models and then decided it'd be better to further generalise to any architecture, precisely because of the structural differences with VLMs that you mentioned in your first comment.

This is in progress on the architecture-agnostic branch. The idea is that we should be able to merge models which have the same architecture, without requiring an architecture specification file.

NickGao96 commented 3 weeks ago

@NickGao96 @meiyiyeshi You're right about the VLM branch, we got that working for Qwen VL models and then decided it'd be better to further generalise to any architecture, precisely because of the structural differences with VLMs that you mentioned in your first comment.

This is in progress on the architecture-agnostic branch. The idea is that we should be able to merge models which have the same architecture, without requiring an architecture specification file.

Thank you, and looking forward to trying the architecture-agnostic branch when it's ready to be merged into the main. It indeed makes more sense to be able to merge models without specifying architecture file.

YuanLiuuuuuu commented 1 week ago

@NickGao96 @meiyiyeshi You're right about the VLM branch, we got that working for Qwen VL models and then decided it'd be better to further generalise to any architecture, precisely because of the structural differences with VLMs that you mentioned in your first comment.

This is in progress on the architecture-agnostic branch. The idea is that we should be able to merge models which have the same architecture, without requiring an architecture specification file.

Very impressive project. Could you please tell me when will architecture-agnostic be ready to merge?

ElliotStein commented 1 week ago

@YuanLiuuuuuu I just pushed another commit, give it a go! Just checkout the architecture-agnostic branch. Will do some more testing and documentation before merging to main branch, but its ready to go now. Let us know how it goes!

Note, if you want to merge sub-modules, i.e. merge the language component of a VLM, with another language model, that is (experimentally) available too in this branch. Just check out the fill_missing_params script, as you'll need to run it to copy the leftover parameters and config files that aren't in the submodule. Will likely make this more streamlined in future.

YuanLiuuuuuu commented 1 week ago

@YuanLiuuuuuu I just pushed another commit, give it a go! Just checkout the architecture-agnostic branch. Will do some more testing and documentation before merging to main branch, but its ready to go now. Let us know how it goes!

Note, if you want to merge sub-modules, i.e. merge the language component of a VLM, with another language model, that is (experimentally) available too in this branch. Just check out the fill_missing_params script, as you'll need to run it to copy the leftover parameters and config files that aren't in the submodule. Will likely make this more streamlined in future.

Thank you for your timely response. I will have a try and give the feedback as soon as possible.

YuanLiuuuuuu commented 6 days ago

@YuanLiuuuuuu I just pushed another commit, give it a go! Just checkout the architecture-agnostic branch. Will do some more testing and documentation before merging to main branch, but its ready to go now. Let us know how it goes!

Note, if you want to merge sub-modules, i.e. merge the language component of a VLM, with another language model, that is (experimentally) available too in this branch. Just check out the fill_missing_params script, as you'll need to run it to copy the leftover parameters and config files that aren't in the submodule. Will likely make this more streamlined in future.

I have tried, and it works now. Thank you for your help.

arcee-ai / mergekit

[request] Support for Vision Language Models #434