PKU-YuanGroup / Video-LLaVA

【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
https://arxiv.org/pdf/2311.10122.pdf
Apache License 2.0
3.03k stars 220 forks source link

Question Regarding Video Frame Processing #169

Open Kkkaystone opened 5 months ago

Kkkaystone commented 5 months ago

I have read your paper and found it very insightful. However, I am still unclear about how to handle video processing as described in the paper. The paper mentions extracting 8 frames from a video, but it is not clear what the next steps are. Are these 8 frames concatenated into a single image, or is there another method you use to process these frames?

I would appreciate it if you could provide more details or point me to any additional resources or code examples that could help clarify this.

Thank you for your time and assistance.