[Question] Would it be possible to provide which 8 frames Video-LLaVA should use?

PKU-YuanGroup / Video-LLaVA

【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

https://arxiv.org/pdf/2311.10122.pdf

Apache License 2.0

2.97k stars 215 forks source link

[Question] Would it be possible to provide which 8 frames Video-LLaVA should use? #17

Closed olcaybuyan closed 11 months ago

olcaybuyan commented 11 months ago

If I read your paper correctly, you're only using 8 frames of a video? Would it be possible to specify which 8 frames to use instead of just using the first 8 frames?

SidHard commented 11 months ago

https://github.com/PKU-YuanGroup/Video-LLaVA/blob/main/llava/model/multimodal_encoder/languagebind/video/processing_video.py#L101 you can adjust frame_id_list from here

LinB203 commented 11 months ago

Change decord to opencv and adjust the frame_id_list. We will provide more flexible codes and a strong video-llm in next version.

olcaybuyan commented 11 months ago

Thanks

TousakaNagio commented 11 months ago

@LinB203 I am wandering when will you release the next version? Thanks!

LinB203 commented 11 months ago

@LinB203 I am wandering when will you release the next version? Thanks!

Conservative estimate, about next month.

fightingaaa commented 11 months ago

Change decord to opencv and adjust the frame_id_list. We will provide more flexible codes and a strong video-llm in next version.

请问是通过增加frame token的方式，但是固定frame 数量。但是指固定fps 方式，frame数量可变呢