OpenBMB / MiniCPM-V

MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone
Apache License 2.0
7.86k stars 547 forks source link

Issue with reading documents with double columns #252

Closed Hastyrush closed 2 weeks ago

Hastyrush commented 3 weeks ago

Hi, thanks for the amazing work done on MiniCPM!

I would like to enquire if the model is capable of extracting text (be it ocr or not) on documents that have double columns such as research papers. I.e. the paragraphs are meant to be read vertically instead of horizontally. I did some experiments on the prompts but it seems that the model cannot interpret documents with double columns. The result is either omitting the other column, or it combines a line from both columns (reading it horizontally instead of vertically). Not sure if this can be mitigated, so some advice would be appreciated. Thanks!

Cuiunbo commented 3 weeks ago

Can you give us an example or two so that we can get a clearer picture, our model has some capacity of table extraction ~ but to makeit perform very well in specific scenarios, it may require small amounts of data to fine-tune it