LLaVA-VL / LLaVA-NeXT

Apache License 2.0
2.88k stars 245 forks source link

update video code #183

Closed ZhangYuanhan-AI closed 2 months ago

ZhangYuanhan-AI commented 2 months ago

Update video code

  1. From 1fps to uniformly sampled
  2. add new_line logic
  3. add faster token logic
kcz358 commented 2 months ago

Reproduce Result

Image

LLaVA-OV-0.5B AI2D, MME

image

Video

LLaVA-OV-0.5B

VideoMME(wo sub),mlvu

image

All the results have been matched for the papers.

ZhangYuanhan-AI commented 2 months ago

@Luodian

ehayeshaiper commented 2 months ago

Why uniformly sample? We don't want it to be consistent across different videos?