DAMO-NLP-SG / VideoLLaMA2

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Apache License 2.0
871 stars 60 forks source link

VideoLLaMA2 performance gap on video benchmarks #92

Closed zhuqiangLu closed 3 weeks ago

zhuqiangLu commented 2 months ago

Hi,

Please correct me if I am wrong. According to this comment, currently, only vllava dataset is available, while the reported performance is trained on another dataset.

It seems there is huge performance gap between the same model training on two different datasets (according to table 1 and table 5).

Considering training video llama2 is somehow expensive, could you please provide the performance of video llama2 on each benchmark(a part from those already listed in table 1) ?

Best

lixin4ever commented 3 weeks ago

As discussed in https://github.com/DAMO-NLP-SG/VideoLLaMA2/issues/81, this issue has been resolved and I'm gonna close it.