MiniCPM vs Tesseract vs EasyOCR etc ...

OpenBMB / MiniCPM-V

MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone

Apache License 2.0

7.97k stars 558 forks source link

Hi! This is a very interesting question. Yes, for OCR models in the traditional way, two-stage model, or sometimes even more stages, It's very different from the way we've implemented VLM so far. With our current training approach (natural language description of answers and questions), there is no way to compare this to traditional ocr models rigorously, but if we fine-tune our model or find natural language ways to describe traditional OCR tasks, for example, OCRBench. Your work in this area is welcome, and I'm sure there are still some gaps here.

OpenBMB / MiniCPM-V

MiniCPM vs Tesseract vs EasyOCR etc ... #178