PKU-YuanGroup / MoE-LLaVA

Mixture-of-Experts for Large Vision-Language Models
https://arxiv.org/abs/2401.15947
Apache License 2.0
1.97k stars 125 forks source link

[Question] Step 3 loss curve #89

Open fanminshi opened 2 months ago

fanminshi commented 2 months ago

Question

I performed the step 3 moe-finetunning on phi-2 model, the loss doesn't seemed to drop that much. I wonder if that's normal. Thanks!

Screenshot 2024-08-16 at 8 29 19 AM