evaluation version of gpt-3.5-turbo

LLaVA-VL / LLaVA-NeXT

Apache License 2.0

2.44k stars 174 forks source link

evaluation version of gpt-3.5-turbo #109

Closed farewellthree closed 1 month ago

farewellthree commented 1 month ago

Hello, I cannot correctly reproduce the test results of llava-next-video. I suspect it might be an issue with the GPT-3.5-turbo version. Different periods had different versions of GPT-3.5-turbo. Using the latest GPT-3.5-turbo-16k version significantly raises the results. The previous model did not have the GPT-3.5-turbo-16k version. So, could you please tell me which version of GPT was used for evaluation in llava-next-video?

ZhangYuanhan-AI commented 1 month ago

gpt-3.5-turbo-0613

Wang-Xiaodong1899 commented 2 weeks ago

hi @farewellthree , if you use gpt-3.5-turbo-0613, can you reproduce the scores in the VideoChatgpt datasets? I found it is also super higher than the score in the table. Such as the tempral score, I obtained 4.12 for LLaVA-NeXT-Video-DPO (7B). Very high!