Open yayafengzi opened 2 months ago
In multi-docvqa, a single data can have up to 20 images. Since llava-next doesn't compress tokens, wouldn't this result in too many tokens?
Your observation is correct, it would produce many image tokens.
In multi-docvqa, a single data can have up to 20 images. Since llava-next doesn't compress tokens, wouldn't this result in too many tokens?