DAMO-NLP-SG / Video-LLaMA

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
BSD 3-Clause "New" or "Revised" License
2.7k stars 243 forks source link

Is video-LLaMA capable of comprehending videos that have faces surrounded by bounding boxes(face recognition) #150

Open PhilipAmadasun opened 5 months ago

PhilipAmadasun commented 5 months ago

Is video-LLaMA capable of comprehending videos that have faces surrounded by bounding boxes(face recognition)?

If I asked video-LLaMA a question to descirbe what each person in a video us doing and to identify them by the names of their bounding box around their face, will it be able to do so?