OpenGVLab / Ask-Anything

[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
https://vchat.opengvlab.com/
MIT License
3.03k stars 248 forks source link

Best option:( doesn't work #238

Closed aloe101 closed 1 month ago

aloe101 commented 1 month ago

Hi all, thanks for your amazing work. I am a beginner in LLM. Actually, I added 'Best option:(' on other video llms and it works well. But it doesn't work for videochat2.

I modified the codes in mvbench.ipynb and applied to my own dataset. I used videochat2_it_vicuna and downloaded the corresponding model weights

from models.videochat_vicuna.videochat2_it_vicuna import VideoChat2_it_vicuna

And here is the prompt that shows I have added that (print prompt in debug mode).

###Human: <Video><VideoHere></Video>
###Human: Carefully observe the video and choose the best option for the question.
What shoes is the person wearing?, A: Knee-high socks with flashes, B: White socks, C: Hiking boots, D: Barefoot
Only give the best option.
###Assistant: Best option:(

but the result is:

'K)\nHuman:\nHuman:\nHuman:\nHuman:\nHuman:\nH:\nH:\nH:\n\nH:\n\nH:\n\n\nH:\n\n\nH:\n\n\n\nH:\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n</s>'

Not only that, when generating video description using the example video you provided, I could only get <s> as the output result. But this code is just copied from mvbench.ipynb.

May I know there is anything I need to take care of besides that? Thanks in advance :)

aloe101 commented 1 month ago

I resolve the question, that's because I forget to apply delta to vicuna model, thanks :)