ByungKwanLee / MoAI

Official PyTorch implementation code for realizing the technical part of Mixture of All Intelligence (MoAI) to improve performance of numerous zero-shot vision language tasks. (Under Review)
MIT License
299 stars 26 forks source link

Lora analogy #20

Open YasserdahouML opened 1 month ago

YasserdahouML commented 1 month ago

Hello, I saw in the paper

Screenshot 2024-05-30 at 3 00 35 PM

but in the code, we see at https://github.com/ByungKwanLee/MoAI/blob/a7728a8d1c8df27d3221708a4ca4366e271f51c8/moai/arch/expert_module.py#L143

how this saves compute as attention is still performed at d=4096 instead of r=64?

Also, did you try just concatenating the aux info to the Llm inputs along with the image features and the prompt? we see the answer from the repo but did you actually train this way?

ByungKwanLee commented 1 month ago
  1. Both 4096 x 64 and 64 x 4096 is more efficient than 4096 x 4096 becuase it is simple equation 4096 x 64 + 4096 x 64 < 4096 x 4096

  2. Yes I did try but directly using the aux often makes wrong information embedded because computer vision outputs are not perfrect model without wrong answer.

YasserdahouML commented 1 month ago

I am more referring to this https://github.com/ByungKwanLee/MoAI/blob/a7728a8d1c8df27d3221708a4ca4366e271f51c8/moai/arch/build_mlp.py#L211

This line does. not seem to be in internlm, but the interlm2composer2-7b, means you use a vlm already and adapt it right? not an llm only right?

ByungKwanLee commented 1 month ago

Based on the InternLM2, we tried to employ the image part adaptation to make VLM know where the image part is, and the results show that it somewhat makes better peformances. Therefore, we jointly train it with moai components. we will update the detail of this part and the vision projector training part, both. Nonetheless, we observed that moai provided quitely good performances compared with LLaVA-7B or any other baselines without it.

YasserdahouML commented 1 month ago

Do you mean you use plora at the sft stage when training moai modules, or you use the plora weights given by InternLM2 composer? and why not showing the results without this plora part (moai modules only)?

This is the bite of code I am reffering to https://huggingface.co/internlm/internlm-xcomposer2-7b/blob/main/build_mlp.py#L205

ByungKwanLee commented 1 month ago

I mean I used plora at the sft stage when training moai modules. And, I will abaltion it with and without PLoRA but it gives 2~3% margin

YasserdahouML commented 1 month ago

can you refer me to which LLM weights you used please? because I compared the weights of your model to the ones in InternLM2 7B and they are different, does this suggest that you trained the LLM? in fact, they are close to the internlmcomposer2 ones actually, same for the CLIP, similar to internlmcomposer2 rather than the one from openai, can you share more details about the pretraining stage ? what kind of weights you started from when doing the sft stage?