DAMO-NLP-SG / Video-LLaMA

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
BSD 3-Clause "New" or "Revised" License
2.77k stars 255 forks source link

change the frames and query_tokens size #128

Open AllenFind opened 11 months ago

AllenFind commented 11 months ago

Hi,

Thanks a lot for your great work!

I am wondering if we can change the sampling frames and query_tokens size.