Open zhijian-liu opened 1 month ago
I would surely suggest using gpt-4o-mini
.
We should inform users about the eval models change after certain PR, likely we should specify the changed datasets in this issue (the eval models used before/after).
I'll pin this issue and link to the PR for visibility if the PR is created.
Some benchmarks (such as ActivityNet, VideoChatGPT, and many others) use
gpt-3.5-turbo-0613
for evaluation, but this model has been discontinued by OpenAI. One quick fix would be to switch togpt-3.5-turbo
, but I would also like to bring up the discussion whether to switch allgpt-3.5-turbo
togpt-4o-mini
as its performance is better and is 3 times cheaper.After the discussion, I'm happy to submit the PR to make the change. @Luodian @kcz358