Closed farewellthree closed 1 month ago
gpt-3.5-turbo-0613
hi @farewellthree , if you use gpt-3.5-turbo-0613
, can you reproduce the scores in the VideoChatgpt datasets? I found it is also super higher than the score in the table. Such as the tempral score, I obtained 4.12 for LLaVA-NeXT-Video-DPO (7B)
. Very high!
Hello, I cannot correctly reproduce the test results of llava-next-video. I suspect it might be an issue with the GPT-3.5-turbo version. Different periods had different versions of GPT-3.5-turbo. Using the latest GPT-3.5-turbo-16k version significantly raises the results. The previous model did not have the GPT-3.5-turbo-16k version. So, could you please tell me which version of GPT was used for evaluation in llava-next-video?