When I evaluated the ‘TGIF_Zero_Shot_QA’ dataset, the accuracy was only 13%. Should I train first to achieve the 70% accuracy in the paper?

PKU-YuanGroup / Video-LLaVA

【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

https://arxiv.org/pdf/2311.10122.pdf

Apache License 2.0

2.88k stars 207 forks source link

Open FanshuoZeng opened 1 month ago