【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
2.88k
stars
207
forks
source link
When I evaluated the ‘TGIF_Zero_Shot_QA’ dataset, the accuracy was only 13%. Should I train first to achieve the 70% accuracy in the paper? #188
Open
FanshuoZeng opened 1 month ago