EvolvingLMMs-Lab / lmms-eval

Accelerating the development of large multimodal models (LMMs) with lmms-eval
https://lmms-lab.github.io/
Other
1.03k stars 53 forks source link

confusion about llava-next for multi-docvqa #68

Open yayafengzi opened 2 months ago

yayafengzi commented 2 months ago

In multi-docvqa, a single data can have up to 20 images. Since llava-next doesn't compress tokens, wouldn't this result in too many tokens?

Luodian commented 1 month ago

Your observation is correct, it would produce many image tokens.