PKU-YuanGroup / Video-LLaVA

【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
https://arxiv.org/pdf/2311.10122.pdf
Apache License 2.0
2.88k stars 207 forks source link

When I evaluated the ‘TGIF_Zero_Shot_QA’ dataset, the accuracy was only 13%. Should I train first to achieve the 70% accuracy in the paper? #188

Open FanshuoZeng opened 1 month ago