HITsz-TMG / UMOE-Scaling-Unified-Multimodal-LLMs

The codes about "Uni-MoE: Scaling Unified Multimodal Models with Mixture of Experts"
https://uni-moe.github.io/
728 stars 33 forks source link

Clarification on 3-Step Training Approach and Commands for Uni-MoE v2 #9

Open Bhagyashreet20 opened 1 week ago

Bhagyashreet20 commented 1 week ago

I like the three step innovative training approach to train the MLLMs. This intrigued me more and I was going through the scripts trying to replicate 3 step training technique to train my own model. However, I have few queries.

  1. is it possible to replicate all three training steps with the scripts in uni-moe-v2 folder?
  2. Could you share the command to train uni-moe-v2-speech as there are only inference and eval scripts?
  3. relating to the 3 step training approach and the given model checkpoints, Uni-MoE 8-expert base is the result of step1, Uni_MoE 8-expert experts model after step 2 and Uni_MoE 8-expert finetune model is the model after step 3. Is my understanding correct?
expapa commented 1 week ago

Thanks for your attention and support to our model! Here's some replies, hope they are helpful for you:

  1. sry we are not releasing the first two stage training script, but these stages can be done by removing the MoE structure from the code.
  2. Sure, the script will be uploaded soon, check it out.
  3. Actually the projector weight and qformer weight have all been changed during the first, second and third stage, so Uni-MoE 8-expert base is the base model we train all our stages from, Uni_MoE 8-expert experts model are the stage 2 result which contains MLPs from the stage 2 models, Uni_MoE 8-expert finetune model is the lora weights and the actual qformer and projector weight for the MoE model.
Bhagyashreet20 commented 1 week ago

cool. Thanks!