Question about multi pages/images

X-PLUG / mPLUG-DocOwl

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

Apache License 2.0

1.12k stars 68 forks source link

Question about multi pages/images #58

Closed sky-fly97 closed 2 months ago

sky-fly97 commented 2 months ago

Could you tell me which model in this series can support multiple-page inputs?

HAWLYQ commented 2 months ago

Hi, @sky-fly97 , our released models are currently not finetuned or evaluated with multi-page samples (mPLUG-PaperOwl could support understanding of multiple diagram images but are not scheduled for release recently). Honestly, we're not quite sure whether these models could handle multi-page input. You can try inference with our docowl1.5 model.

sky-fly97 commented 2 months ago

Thanks, I will try. By the way, it seems that there are very few models on the market that can handle multi-page input, I've only seen qwen-vl-chat and GPT4V so far.

HAWLYQ commented 2 months ago

Thanks, I will try. By the way, it seems that there are very few models on the market that can handle multi-page input, I've only seen qwen-vl-chat and GPT4V so far.

Yes，there is still a lack of effective open-source methods for multi-image understanding.