OpenGVLab / Ask-Anything

[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
https://vchat.opengvlab.com/
MIT License
2.85k stars 230 forks source link

The evaluation for EgoSchema #195

Closed 921112343 closed 1 week ago

921112343 commented 2 weeks ago

Thank you for open-sourcing this excellent work. I have noticed that your model performs exceptionally well on the EgoSchema dataset. However, I found detailed descriptions of your evaluation process only for NExT-QA, STAR, and TVQA in the Readme. Could you please share the steps you took to prepare the EgoSchema data and how you evaluated your model on it?

Andy1621 commented 2 weeks ago

Hi~ Thanks for your interest. I will give a scripts for EgoSchema later~

Andy1621 commented 2 weeks ago

@921112343 Please check https://github.com/OpenGVLab/Ask-Anything/blob/main/video_chat2/demo/demo_mistral.ipynb

yuanrr commented 2 weeks ago

Very wonderful update !! Could you please also offer some examples for evaluating zero-shot videoqa, like msvd and msrvtt....thanks a lot !

Andy1621 commented 2 weeks ago

For the VideoQA, you can simply modify the code and save the responses, and then use ChatGPT to give a score.

However, I don't suggest to evaluate the VideoLMMs' capacities on the tradictional QA benchmarks like MSRVTT/MSVD, which can not reveal their essential problems.

yuanrr commented 2 weeks ago

Thank you for your reply, I was wondering if any new work has been done recently to discuss this metric like MSRVTT/MSVD....cause your view is very new to me...

921112343 commented 2 weeks ago

@921112343 Please check https://github.com/OpenGVLab/Ask-Anything/blob/main/video_chat2/demo/demo_mistral.ipynb

Thanks for your update!!

Andy1621 commented 2 weeks ago

Thank you for your reply, I was wondering if any new work has been done recently to discuss this metric like MSRVTT/MSVD....cause your view is very new to me...

No work discuss the potential bias in such benchmark, but one paper has reveled the single frame bias about the video dataset. Please check Revealing Single Frame Bias for Video-and-Language Learning.