[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
BSD 3-Clause "New" or "Revised" License
2.83k
stars
263
forks
source link
What if no frame_position_embeddings? #158
Open
LetsGoFir opened 6 months ago
Will the performance be worse?