OpenBMB / MiniCPM-V

MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone
Apache License 2.0
7.97k stars 558 forks source link

MiniCPM vs Tesseract vs EasyOCR etc ... #178

Closed asusdisciple closed 1 month ago

asusdisciple commented 1 month ago

Hello I saw your very compelling benchmark table, however I wonder how your model performs in comparison to the "old" way of doing OCR, namely tesseract and other solutions which are not necessarily based on deep learning?

Are there any benchmarks or did somebody here make some tests to get a general idea how they compare in OCR? I can imagine MiniCPM being a fair bit better in terms of quality buy probably also much slower in terms of performance.

Cuiunbo commented 1 month ago

Hi! This is a very interesting question. Yes, for OCR models in the traditional way, two-stage model, or sometimes even more stages, It's very different from the way we've implemented VLM so far. With our current training approach (natural language description of answers and questions), there is no way to compare this to traditional ocr models rigorously, but if we fine-tune our model or find natural language ways to describe traditional OCR tasks, for example, OCRBench. Your work in this area is welcome, and I'm sure there are still some gaps here.