Issue with reading documents with double columns

OpenBMB / MiniCPM-V

MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone

Apache License 2.0

7.86k stars 547 forks source link

Hi, thanks for the amazing work done on MiniCPM!

I would like to enquire if the model is capable of extracting text (be it ocr or not) on documents that have double columns such as research papers. I.e. the paragraphs are meant to be read vertically instead of horizontally. I did some experiments on the prompts but it seems that the model cannot interpret documents with double columns. The result is either omitting the other column, or it combines a line from both columns (reading it horizontally instead of vertically). Not sure if this can be mitigated, so some advice would be appreciated. Thanks!

OpenBMB / MiniCPM-V

Issue with reading documents with double columns #252