DAMO-NLP-SG / Video-LLaMA

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
BSD 3-Clause "New" or "Revised" License
2.76k stars 253 forks source link

Evaluation on large-scale dataset #151

Open hritam-98 opened 6 months ago

hritam-98 commented 6 months ago

Hello, Thank you for your amazing work. The demo runs fine for a single video.

I'm curious if there are any provisions for generating inference on a larger dataset of videos, each accompanied by corresponding text questions. Additionally, I'm interested to know if there's an API available for this purpose.

Looking forward to your insights on this matter.

Cece1031 commented 3 months ago

I need this too! Do u know how to evaluate on large-scale dataset now?