LLaVA-VL / LLaVA-NeXT

Apache License 2.0
2.45k stars 178 forks source link

Evaluation of the video detailed description #122

Open royzhang12 opened 1 month ago

royzhang12 commented 1 month ago

Hi @ZhangYuanhan-AI
Thanks for the wonderful job. Just a question about the evaluation of the detailed description. I found the result of the gpt eval score will be convert to int --- int(score), which seems not quite reasonable. As you can get a maximum score of 4 even you get 4.8 or 4.9.

ZhangYuanhan-AI commented 1 month ago

This is because we found chatgpt model can not perfercectly sense the meaning of the decimal. Like https://community.openai.com/t/why-9-11-is-larger-than-9-9-incredible/869824