PKU-YuanGroup / Video-LLaVA

【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
https://arxiv.org/pdf/2311.10122.pdf
Apache License 2.0
2.99k stars 218 forks source link

Video checkpoint is broken #3

Closed NoSavedDATA closed 12 months ago

NoSavedDATA commented 12 months ago

I've run the requirements installation, but the model spits out random words when I run the video examples from gradio. The image captioning works fine. The hugging face demo for video captioning works fine as well.

LinB203 commented 12 months ago

Sorry for that. I'll check and fix it.

LinB203 commented 12 months ago

Now it is available!