Could you provide evaluation codes for NExT-QA, STAR and TVQA on video_chat2?

OpenGVLab / Ask-Anything

[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.

https://vchat.opengvlab.com/

MIT License

2.86k stars 230 forks source link

Could you provide evaluation codes for NExT-QA, STAR and TVQA on video_chat2? #89

Open CUCldyyyyy opened 6 months ago

CUCldyyyyy commented 6 months ago

Hey,your work is really impressive! Could you provide evaluation codes for NExT-QA, STAR and TVQA ,it seems some changes must be made on original mvbench.ipynb file. I'd be appreciate if you could.Thanks again!

Andy1621 commented 6 months ago

For NExT-QA, STAR and TVQA, we simply change the code as in mvbench.ipynb. You need to prepare the corresponding dataset and use the same testing prompt.

Andy1621 commented 6 months ago

You can follow SeViLA to prepare the dataset, and change the code to load the JSON.

CUCldyyyyy commented 6 months ago

I see，thanks for your reply.

CUCldyyyyy commented 6 months ago

hey!A problem occured to my inference code,could u tell me the possible reason?

`Question: why does the owl fly back to the man in green and land on the arm of the lady in white? Options: (A) defend itself. (B) green man instructed owl. (C) greeting lady. (D) escape man. (E) find food. Only give the best option.

Assistant: Best option:(

(bolds君 — † GT: (B) green man instructed owl. Part Acc: 0.00% Total Acc: 0.00%`

CUCldyyyyy commented 6 months ago

messy code is given in Best option,error is :IndexError: piece id is out of range.

Andy1621 commented 6 months ago

You might use the wrong version of Vicuna-v0, please check https://github.com/OpenGVLab/Ask-Anything/issues/81

CUCldyyyyy commented 6 months ago

Thank you for your response! I have resolved the issue. Additionally, I noticed that the suffixes in the fine-tuning weights for the three released stages include '7b'. When I directly use 'vicuna-13b-v0', I encounter a dimension mismatch error during the model weight loading (4096 vs. 5120). How can this be resolved? Do I need to modify the source code dimensions, or is it due to the absence of a 13b version in the currently released fine-tuning weights?

Andy1621 commented 6 months ago

Yes! The model needs to be retrained with a new LLM. Currently, we do not release the 13B model for its marginal improvement~

CUCldyyyyy commented 5 months ago

Hey！I need to run inference on MSVD-QA,wondering how to change the original mvbench.ipynb file to fit the open-ended dataset? It seems designed for multi-choice task and the prompt need to be revised for op task. Could u provide the method to reproduce the result on MSVD-QA?Thanks a lot!

Andy1621 commented 5 months ago

Please check the code in Video_ChatGPT.