dvlab-research / LLaMA-VID

LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)
Apache License 2.0
693 stars 43 forks source link

whether able to inference long video without using subtitles? #38

Closed Deaddawn closed 8 months ago

Deaddawn commented 9 months ago

Hi, there. I am wondering is it possible to inference long video just like short videos without using subtitles?

yanwei-li commented 9 months ago

Hi, of course, you can infer long videos without subtitles. But in this case, the model cannot know the name of each character and what happens exactly in the video along the timeline. It may cause it to answer questions like "a man is doing xxx and then he xxx", which may lose the storyline of this video.