DAMO-NLP-SG / Video-LLaMA

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
BSD 3-Clause "New" or "Revised" License
2.77k stars 255 forks source link

What is the input sample of the forward function in videollama #146

Open llx-08 opened 7 months ago

llx-08 commented 7 months ago

Hi, I'm wondering what is the input sample of the forward function in videollama.py.

1709872064572

It seems like an dict() which contains image, text_input as its keys, but I can't find any usage as example. Besides, I check the inference process in demo_audiovideo.py, it's different with the forward process. Can you provide some example to use the forward function in videollama? Thank you very much!

EQ3000 commented 6 months ago

I am also finding this solution.!