PKU-YuanGroup / Video-LLaVA

【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
https://arxiv.org/pdf/2311.10122.pdf
Apache License 2.0
3.02k stars 220 forks source link

执行推理脚本报错 #199

Open crystal-ltf opened 6 days ago

crystal-ltf commented 6 days ago

ValueError: The input provided to the model are wrong. The number of videos tokens is 0 while the number of videos given to the model is 1. This prevents correct indexing and breaks batch generation.