[Question] The evaluation results vary every time.

PKU-YuanGroup / MoE-LLaVA

Mixture-of-Experts for Large Vision-Language Models

https://arxiv.org/abs/2401.15947

Apache License 2.0

1.9k stars 121 forks source link

[Question] The evaluation results vary every time. #59

Closed koda-11 closed 6 months ago

koda-11 commented 6 months ago

Question

When i tried to evaluate LanguageBind/MoE-LLaVA-Phi2-2.7B-4e model, the evaluation results vary every time. (e.g. 1st: 61.42, 2nd: 61.32 for GQA evaluation) Please check if this phenomenon is common or if there is a problem.

Thanks