confusion about llava-next for multi-docvqa

EvolvingLMMs-Lab / lmms-eval

Accelerating the development of large multimodal models (LMMs) with lmms-eval

https://lmms-lab.github.io/

Other

1.03k stars 53 forks source link

Open yayafengzi opened 2 months ago

yayafengzi commented 2 months ago

In multi-docvqa, a single data can have up to 20 images. Since llava-next doesn't compress tokens, wouldn't this result in too many tokens?

Luodian commented 1 month ago

Your observation is correct, it would produce many image tokens.