PKU-YuanGroup / MoE-LLaVA

Mixture-of-Experts for Large Vision-Language Models
https://arxiv.org/abs/2401.15947
Apache License 2.0
1.91k stars 121 forks source link

第二阶段,loss下降到多少比较合理? #31

Open awzhgw opened 7 months ago

awzhgw commented 7 months ago

当我跑第二阶段的时候,发现跑的很慢,并且GPU使用率忽高忽低。

同时我看到loss从开始就是1.1 ,目前跑了20个step,依旧在1.2左右,loss貌似没有下降。这个正常吗?

你训练的时候,第二阶段LOSS下降到多少比较合理呢?

LinB203 commented 7 months ago

Refer to log from LLaVA.

https://wandb.ai/lht/huggingface/reports/LLaVA-v1-5-Training-Logs--Vmlldzo1NzE1Mzcx?accessToken=gh6ju8v40olw6d7p9ja498ntyxrddbzy23xovhde3hc8nk5dva2jy46u5hj0dg2u