DAMO-NLP-SG / Video-LLaMA

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
BSD 3-Clause "New" or "Revised" License
2.77k stars 255 forks source link

Multiple Video-Text pair Support #129

Open mustafaadogan opened 10 months ago

mustafaadogan commented 10 months ago

Hello!

First of all, I'd like to congratulate you on your great work. I have a question: I'm looking to evaluate the model's performance in a different way by using in-context examples. Specifically, I'm interested in feeding the model multiple in-context video-text examples. Is it possible to do so?

hieuchi911 commented 6 months ago

Hi @mustafaadogan did you find a way to do that?