DAMO-NLP-SG / Video-LLaMA

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
BSD 3-Clause "New" or "Revised" License
2.77k stars 255 forks source link

Interesting prompt template #126

Closed tian1327 closed 11 months ago

tian1327 commented 11 months ago

Thanks for the great work! I am wondering if you could shed some light on the prompt template used in the model. I find it particularly interesting to prompt by "eyes" and "ears" etc. to guide the model to focus on different features. How did you find such prompts? Did u see it used in other works?

"Close your eyes, open your ears and you imagine only based on the sound that: . \ Close your ears, open your eyes and you see that . \ Now answer my question based on what you have just seen and heard."

hangzhang-nlp commented 11 months ago

Thank you for your kind words and for your intriguing question! The prompt template you mentioned, involving "eyes" and "ears," is quite interesting and serves as a creative way to guide the LLM to distinguish different sensory features. Regarding the origin of this prompt template, it's actually an innovation developed by our team.