PKU-YuanGroup / MoE-LLaVA

Mixture-of-Experts for Large Vision-Language Models
https://arxiv.org/abs/2401.15947
Apache License 2.0
1.9k stars 121 forks source link

[Question] 关于第三阶段训练loss #66

Open rangmiao opened 5 months ago

rangmiao commented 5 months ago

Question

作者你好,很感谢你们的工作!我在复现MoE-phi2X4-Top2 时,在第三阶段moe_loss 一直在16附近波动,想问下这是否正常