DAMO-NLP-SG / VideoLLaMA2

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Apache License 2.0
464 stars 27 forks source link

The title of the paper behind the link is not that of the link text #2

Open PromptExpert opened 4 weeks ago

PromptExpert commented 4 weeks ago

link text: VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs Actual title of the paper: Video-LLaMA An Instruction-tuned Audio-Visual Language Model for Video Understanding

lixin4ever commented 4 weeks ago

Thanks for your attention.

The title you mentioned above refers to the first version of VideoLLaMA while this repo is for the second version (i.e., VideoLLaMA 2). We are still drafting the technical report for VideoLLaMA 2 and it should be available by the end of this week.

MoonBlvd commented 3 weeks ago

looking forward to see VideoLLaMA2 report/paper! Wondering what's the difference with VideoLLaMA.

lixin4ever commented 3 weeks ago

@MoonBlvd Glad to hear this! The differences from VideoLLaMA 1 are basically improved architectural designs, stronger models (will be open-sourced for sure), and a much much more user-friendly codebase for training and evaluating VideoLLMs (which I think is the most beneficial part for the community), please stay tuned :grin: