EvolvingLMMs-Lab / lmms-eval

Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
https://lmms-lab.framer.ai/
Other
2.11k stars 162 forks source link

Discussion: Update GPT eval models #294

Open zhijian-liu opened 1 month ago

zhijian-liu commented 1 month ago

Some benchmarks (such as ActivityNet, VideoChatGPT, and many others) use gpt-3.5-turbo-0613 for evaluation, but this model has been discontinued by OpenAI. One quick fix would be to switch to gpt-3.5-turbo, but I would also like to bring up the discussion whether to switch all gpt-3.5-turbo to gpt-4o-mini as its performance is better and is 3 times cheaper.

After the discussion, I'm happy to submit the PR to make the change. @Luodian @kcz358

Luodian commented 1 month ago

I would surely suggest using gpt-4o-mini.

We should inform users about the eval models change after certain PR, likely we should specify the changed datasets in this issue (the eval models used before/after).

I'll pin this issue and link to the PR for visibility if the PR is created.