Closed insinfo closed 1 month ago
You did not specify the model you were using (and the image you showed is not segmented yet, so hard to tell to what extent the errors are caused by bad layout recognition). AFAIK there is no public model for Portuguese yet. The only modern model is uw3-modern-english, but naturally, this would perform much worse on non-English text.
I did a test to OCR scanned documents in Brazilian Portuguese, and I saw that calamari/ocr4all makes a lot of mistakes on scanned documents in Portuguese
the correct thing would be