Hi all, thanks for your amazing work. I am a beginner in LLM. Actually, I added 'Best option:(' on other video llms and it works well. But it doesn't work for videochat2.
I modified the codes in mvbench.ipynb and applied to my own dataset. I used videochat2_it_vicuna and downloaded the corresponding model weights
from models.videochat_vicuna.videochat2_it_vicuna import VideoChat2_it_vicuna
And here is the prompt that shows I have added that (print prompt in debug mode).
###Human: <Video><VideoHere></Video>
###Human: Carefully observe the video and choose the best option for the question.
What shoes is the person wearing?, A: Knee-high socks with flashes, B: White socks, C: Hiking boots, D: Barefoot
Only give the best option.
###Assistant: Best option:(
Not only that, when generating video description using the example video you provided, I could only get <s> as the output result. But this code is just copied from mvbench.ipynb.
May I know there is anything I need to take care of besides that? Thanks in advance :)
Hi all, thanks for your amazing work. I am a beginner in LLM. Actually, I added 'Best option:(' on other video llms and it works well. But it doesn't work for videochat2.
I modified the codes in mvbench.ipynb and applied to my own dataset. I used videochat2_it_vicuna and downloaded the corresponding model weights
And here is the prompt that shows I have added that (print prompt in debug mode).
but the result is:
Not only that, when generating video description using the example video you provided, I could only get
<s>
as the output result. But this code is just copied from mvbench.ipynb.May I know there is anything I need to take care of besides that? Thanks in advance :)