DAMO-NLP-SG / Video-LLaMA

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
BSD 3-Clause "New" or "Revised" License
2.7k stars 243 forks source link

What if no frame_position_embeddings? #158

Open LetsGoFir opened 4 months ago

LetsGoFir commented 4 months ago

Will the performance be worse?