PKU-YuanGroup / MoE-LLaVA

Mixture-of-Experts for Large Vision-Language Models
https://arxiv.org/abs/2401.15947
Apache License 2.0
1.9k stars 121 forks source link

Training of Stage 3 , 第三阶段训练,代码中实际的训练参数与论文不符 #85

Open Wuyingwen opened 1 month ago

Wuyingwen commented 1 month ago

Question

MoE 第三阶段在示意图中画的是 projector(MLP) 不训练,但是实际代码中 QWen-Stage2 的预训练模型的 freeze_mm_mlp_adapter=False,也就是说第三阶段 mm_projector 的参数也会更新。请问这个冲突如何解释?

Wuyingwen commented 1 month ago

在训练代码 train.py 开始训练前打印训练参数,其中包含下列项: transformer.mm_projector.image_spatial_proj.0.weight transformer.mm_projector.image_spatial_proj.0.bias transformer.mm_projector.image_spatial_proj.2.weight transformer.mm_projector.image_spatial_proj.2.bias