DAMO-NLP-SG / Video-LLaMA

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
BSD 3-Clause "New" or "Revised" License
2.83k stars 263 forks source link

how to increase the numbers of input frame? #155

Open onlyonewater opened 7 months ago

onlyonewater commented 7 months ago

hi, authors, I want to use Video-LLaMA to infer my own dataset, I find that the current framework supports the max number of input frames as 32, if I change the frames in the config that more than 32, there is an error shown, so how to increase the frames that more than 32?

thanks!!!

EQ3000 commented 7 months ago

I am also needing this THX!

onlyonewater commented 7 months ago

it cannot seem to be more than 32 frames. since the input dimension of the checkpoint that the authors provide is 32*768.